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Abstract 

Robust  parameter  design  (RPD)  is  implemented  in  systems  in  which  a  user  wants 
to  minimize  the  variance  of  a  system  response  caused  by  uncontrollable  factors  while 
obtaining  a  consistent  and  reliable  system  response  over  time.  Typically,  quadratic 
regression  is  deemed  sufficient  to  specify  a  process  model  of  model  system  behavior. 

We  propose  the  use  of  artificial  neural  networks  (ANNs)  to  compensate  for  highly  non¬ 
linear  problems  that  quadratic  regression  fails  to  accurately  model. 

RPD  is  conducted  under  the  assumption  that  the  relationship  between  the  system 
response  and  controllable  and  uncontrollable  variables  does  not  change  over  time.  Since 
degradation  in  the  system  response  will  almost  certainly  occur;  this  assumption  will 
inevitably  be  violated.  We  propose  a  methodology  to  find  a  new  set  of  settings  that  will 
be  robust  to  moderate  system  degradation  while  remaining  robust  to  noise  variables 
within  the  system.  An  algorithm  is  presented  for  this  enhanced  RPD  analysis  utilizing 
both  quadratic  regression  and  two  specific  artificial  neural  network  architectures. 

RPD  has  been  well  developed  on  single  response  problems.  Sparse  literature 
exists  on  dealing  with  multiple  responses  in  RPD  and  most  methods  utilize  a  subjective 
weighting  scheme.  To  account  for  multiple  responses,  we  examine  the  use  of  factor 
analysis  on  the  response  data.  Linear  combination  techniques  are  also  developed  in  the 
case  that  more  than  a  single  factor  is  retained  in  the  analysis. 

All  the  proposed  techniques  are  applied  to  textbook  applications  to  demonstrate 
their  utility.  An  Air  Force  application  problem  is  then  examined  to  demonstrate  the  new 
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technique’s  potential  on  a  real-world  problem  that  is  highly  non-linear.  The  application 
is  a  detector  developed  to  detect  anomalies  within  hyper-spectral  imagery. 

The  results  of  this  research  include  successful  implementation  of  artificial  neural 
networks  in  RPD.  These  artificial  neural  networks  can  be  utilized  when  faced  with  a 
highly  non-linear  problem.  Also,  new  settings  are  developed  that  are  shown  to  be 
superior  to  traditional  robust  settings  when  a  system  is  subject  to  perfonnance 
degradation.  A  new  methodology  of  approaching  multiple  response  problems  is 
developed  which  shows  promise.  Finally,  the  anomaly  detector  is  further  enhanced 
through  the  use  of  artificial  neural  networks  to  determine  robust  settings  and  alternate 
settings  when  degradation  is  expected. 
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NEURAL  EXTENSIONS  TO  ROBUST  PARAMETER  DESIGN 


I.  Introduction 


1.1  General  Discussion 

Robust  parameter  design  (RPD)  detennines  a  set  of  control  variable  settings  that 
minimize  the  variance  of  the  response  caused  by  different  sources  of  noise  in  a  system 
while  satisfying  the  constraint  on  the  mean  (Myers  &  Montgomery,  2002).  The  idea  is 
that  second  order  models  capture  the  mean  and  variance  of  the  system  response  and  that 
these  models  do  not  change  in  time. 

Many  Air  Force  applications  involve  modeling  systems  with  a  large  number  of 
control  settings  outputting  multiple  responses.  One  such  application  was  created  by 
Johnson  (2008)  which  is  an  autonomous  global  anomaly  detector  (AutoGAD).  In  its 
current  version,  the  detection  algorithm  implemented  in  MATLAB®  contains  eleven 
control  variables  and  four  responses.  Davis  (2009)  first  applied  RPD  on  AutoGAD  to 
determine  robust  control  settings  and  promising  results  were  realized.  However, 

ANOVA  analysis  suggested  the  use  of  quadratic  regression  was  inadequate  to  predict  true 
response  values. 

In  AutoGAD,  RPD  assumes  that  new  information  will  closely  resemble  the 
training  data.  This  may  not  be  an  appropriate  assumption  in  many  applications.  For 
example,  the  detection  algorithm  could  encounter  an  image  “noisier”  than  any  image  in 
its  library.  In  the  case  of  AutoGAD,  such  an  occurrence  corresponds  to  “system 
degradation.” 
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The  main  objective  of  this  research  is  to  detennine  a  new  set  of  control  variable 
settings  that  are  not  only  robust  to  noise  in  the  system,  but  also  are  robust  to  a  moderate 
amount  of  system  degradation.  In  mechanical  systems,  this  degradation  could  be  of  a 
physical  nature.  In  software  systems,  the  degradation  is  represented  as  being  exposed  to 
inputs  beyond  their  experience  and  training.  Other  objectives  of  this  research  suggest  the 
use  of  artificial  neural  networks  as  an  alternative  to  quadratic  regression  to  model  the 
process.  Finally,  the  problem  of  multiple  responses  is  explored. 

1.2  Motivation 

The  motivation  behind  this  research  is  to  apply  appropriate  robust  settings  to  the 
detection  algorithm  to  improve  its  performance.  The  detection  algorithm  is  currently 
employed  based  on  the  settings  suggested  by  Johnson  (2008).  These  settings  were  based 
on  the  experience  of  the  author  and  tended  to  maximize  only  one  of  the  four  available 
responses  in  detection.  Further,  the  settings  were  based  on  only  eight  given  images,  thus 
the  author  used  the  same  images  in  the  testing  set  as  the  training  set.  This  situation 
typically  leads  to  an  overly  optimistic  view  of  system  perfonnance. 

1.3  Research  Goals 

The  first  goal  of  this  research  was  to  determine  a  new  set  of  “doubly  robust” 
settings  that  are  robust  to  noise  variables  and  system  degradation.  The  algorithm 
constructed  in  this  research  was  applied  to  the  detection  algorithm,  AutoGAD,  created  by 
Johnson  (2008).  Comparisons  were  made  from  these  doubly  robust  settings  to  traditional 
RPD  settings  in  the  presence  of  system  degradation. 
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A  secondary  goal  of  this  research  was  to  examine  the  use  of  artificial  neural 
networks  (ANNs)  as  a  substitute  for  quadratic  regression  in  RPD.  These  ANNs  were 
compared  with  quadratic  regression  to  determine  their  ability  to  better  fit  highly  non¬ 
linear  models.  Robust  and  doubly  robust  settings  were  also  calculated. 

A  third  goal  of  this  research  was  to  determine  a  new  method  of  combining 
multiple  responses  into  a  single  dimension.  Factor  analysis  was  explored  as  the 
appropriate  technique  in  determining  commonalities  among  various  responses.  Finally,  if 
more  than  one  factor  was  retained,  linear  combination  methods  were  suggested  to 
combine  multiple  factor  scores  into  a  single  dimension. 

1.4  Proposed  Research  Contributions 

An  algorithm  to  detennine  new  “doubly  robust”  settings  that  can  be  applied  to 
problems  containing  control  and  noise  variables  was  developed  in  this  research.  To  solve 
the  dual  response  problem  of  RPD,  one  particular  methodology  was  utilized  within  the 
framework,  as  suggested  by  Lin  &  Tu  (1995),  but  can  be  extended  to  include  other 
methods  (Robinson  et  ai.,  2004). 

Another  contribution  from  this  research  is  the  development  of  ANN  approaches  to 
RPD,  as  well  as  system  degradation  in  RPD.  Radial  basis  function  neural  networks  and 
generalized  regression  neural  networks  were  explored  in  RPD.  Two  different  methods  of 
processing  response  data  generated  for  RPD  research  were  discussed,  which  can  be 
utilized  with  combined  or  crossed  array  designs.  Successful  applications  of  the  neural 
networks  and  response  data  processing  approaches  to  highly  non-linear  problems  are 
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presented.  The  use  of  ANNs  is  appropriate  when  quadratic  regression  fails  to  perform 
well. 

Finally,  the  contribution  of  an  alternative  method  of  reducing  multiple  responses 
to  a  single  dimension  was  derived.  A  non  subject-matter  expert  method  utilized  linear 
combination  techniques  that  turned  multiple  response  problems  into  a  single  dimension 
for  easier  RPD  analysis. 

1.5  Organization  of  Dissertation 

The  following  is  the  organization  of  the  dissertation.  A  summary  of  current 
literature  pertinent  to  robust  parameter  design,  artificial  neural  networks,  multiple 
responses,  and  factor  analysis  is  provided  in  Chapter  2.  The  literature  provides  a  basis 
upon  which  these  techniques  are  expanded  and  applied  to  RPD. 

In  Chapter  3,  an  algorithm  is  developed  to  determine  doubly  robust  settings  in 
RPD  and  an  example  problem  is  provided.  In  this  chapter,  the  use  of  ANNs  and 
quadratic  regression  as  applied  to  RPD  problems  are  contrasted.  Finally,  factor  analysis 
is  applied  to  the  response  data  to  reduce  the  dimensionality  of  the  responses.  If  factor 
analysis  yields  more  than  one  factor,  eight  linear  combination  methods  are  suggested  to 
reduce  these  factors  to  a  single  dimension.  A  five  response  problem  is  explored  to 
demonstrate  the  use  of  factor  analysis  on  multiple  responses. 

In  Chapter  4,  the  technique  of  doubly  robust  settings  is  applied  to  a  current 
detection  algorithm  employed  by  the  Air  Force.  Also,  ANNs  are  implemented  in  this 
same  detection  algorithm  to  demonstrate  their  superiority  to  quadratic  regression  in 
highly  non-linear  problems.  Finally,  the  factor  analysis  techniques  are  applied  to  the  four 
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responses  produced  from  the  detection  algorithm  to  show  their  usefulness  over  simply 
summing  the  normalized  response  data.  Results  are  summarized  and  discussed  for  all 
techniques. 

In  Chapter  5,  the  contributions  and  conclusions  of  this  research  are  presented. 
Also,  recommendations  for  future  work  are  explored. 
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II.  Literature  Review 


2.1  Overview 

Topical  areas  pertinent  to  this  research  are  highlighted  in  this  chapter.  These 
areas  include  robust  parameter  design,  neural  networks,  multiple  response  problems,  and 
factor  analysis.  A  brief  background  on  each  of  the  specified  areas  is  given  to  demonstrate 
current  knowledge  in  each  of  their  respective  fields  thus  building  a  foundation  on  which 
to  further  this  work  into  unexplored  realms. 

2.2  Robust  Parameter  Design 

When  performing  experiments  that  contain  controllable  and  uncontrollable 
parameters,  robust  parameter  design  (RPD)  is  implemented  to  obtain  a  desired  output 
value  while  minimizing  the  variance  caused  by  the  settings  of  the  controls  and  the  various 
noise  in  the  system  (Myers  et  al.,  2002).  Genichi  Taguchi  (1986)  first  introduced  the 
method  of  RPD  to  the  United  States  in  the  1980s.  Much  controversy  was  raised  on 
Taguchi’s  approach,  but  since  then,  new  response  surface  methods  have  been  developed. 
These  methods  are  more  accepted  in  the  statistical  and  engineering  communities.  The 
use  of  RPD  has  extended  to  a  wide  array  of  experimental  designs  and  has  been 
implemented  in  the  practice  of  many  top  companies  such  as  AT&T,  Ford,  and  Xerox 
(Myers  &  Montgomery,  2002). 

Myers  &  Montgomery  (2002)  summarize  four  different  focuses  of  RPD.  The  first 
focus  involves  designing  a  system  that  is  fairly  insensitive  to  environmental  factors  once 
the  system  becomes  operational.  The  second  focus  is  to  design  the  system  to  be 
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insensitive  to  variability  caused  when  the  system  becomes  operational.  The  third  focus  of 
RPD  is  to  design  the  system  as  close  as  possible  to  the  specifications  desired  by  the  user. 
Finally,  the  conditions  of  the  system  should  be  set  to  achieve  a  target  value  while 
minimizing  the  variance  present  around  the  target  value. 

Primarily,  RPD  is  employed  based  on  the  fourth  focus  presented  which  achieves  a 
target  value  while  minimizing  the  variability.  To  perfonn  RPD,  one  must  understand  the 
variables  involved  in  the  system  (Brenneman  &  Myers,  2003).  Two  types  of  variables 
exist:  controllable,  denoted  by  x,  and  uncontrollable  (noise),  denoted  by  z.  The  control 
variables  of  the  system  are  those  variables  the  user  is  able  to  set.  Noise  variables  are 
those  variables  present  that  the  user  cannot  control  and  may  be  known  or  unknown.  RPD 
is  used  if  noise  variable  settings  produce  different  outputs  of  the  system  when 
combinations  of  control  settings  are  selected.  Thus  the  response,  Y,  is  assumed  to  be  a 
function  of  the  controllable  variables  and  the  noise:  Y  =  f(x,z). 

The  primary  interest  in  the  two  types  of  variables  lies  in  the  interaction  between 
the  two.  If  the  noise  variables  are  independent  of  the  control  variables,  then  the  variance 
of  the  control  variables  is  constant  and  the  need  for  RPD  is  moot.  However,  if  an 
interaction  between  the  two  types  of  variables  exists,  then  RPD  is  employed  to  detennine 
which  settings  of  the  control  variables  should  be  utilized  to  minimize  the  variance. 
Brenneman  &  Myers  (2003)  present  Figure  1  below  to  demonstrate  the  interaction  of 
control  and  noise.  It  is  seen  that  if  the  control  setting  is  set  at  “high”  then  noise  has  no 
real  effect  on  the  control  variables  thus  the  variance  is  constant  for  this  setting.  However, 
at  a  setting  of  “low”  for  the  control  variable,  significant  variance  exists  in  the  system  and 
this  setting  is  not  desired  since  the  output  will  be  inconsistent. 
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Figure  1.  Control  and  Noise  Variable  Interaction  Plot 

Two  design  approaches  exist  to  perform  RPD  on  a  process.  The  first  design, 
developed  by  Taguchi,  utilizes  a  crossed  array  design.  The  second  design,  developed  by 
the  response  surface  community,  utilizes  a  combined  array  design.  An  in-depth 
examination  into  the  two  designs  will  be  presented  in  this  document  as  well  as 
comparisons  and  contrasts. 

2.2.1  Crossed  Array  Design 

Taguchi  suggested  an  orthogonal  array  consisting  of  control  variables  to  be 
crossed  with  an  orthogonal  array  of  noise  variables,  which  generated  a  crossed  array 
design  (Myers  et  al.,  2002).  The  outer  and  inner  array  designs  can  be  full  or  fractional 
factorial,  but  an  outer  array  design  must  be  perfonned  at  all  of  the  inner  array  points.  For 
example,  an  experiment  with  two  control  variables  and  two  noise  variables,  with  full 
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fractional  designs  on  both  arrays,  consists  of  7?  x  2"  =  16  design  points.  Figure  2  below 
displays  this  example  of  a  crossed  array  design. 


Figure  2,  Crossed  Array  Design  (Myers  &  Montgomery,  2002) 

As  another  example,  a  three  control  and  two  noise  variable  problem,  with  full 
factorials  on  both  the  inner  and  outer  arrays,  produced  a  design  matrix  of  size 
23  x  22  =  32  .  The  number  of  control  variables  increased  by  one  and  the  number  of  design 
points  doubled.  Thus,  this  design  can  lead  to  a  large  number  of  runs  if  there  exists 
several  control  and  noise  variables  or  if  more  than  two  levels  are  chosen  for  each  factor. 

Reduced  designs,  in  terms  of  resolution,  can  be  used  on  the  inner  and  outer  arrays, 
but  the  issue  remains  that  the  outer  array  must  be  performed  at  each  inner  array  design 
point.  This  particular  design  allows  the  user  to  understand  any  control  by  noise 
interactions  that  may  exist,  but  limits  the  ability  to  understand  control  by  control  or  noise 
by  noise  interactions. 
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To  analyze  the  data  in  an  experiment,  Taguchi  suggests  the  use  of  a  summary 
statistic  on  the  outer  array  known  as  the  signal-to-noise  ratio  (SNR).  SNR  values 
summarize  the  mean  and  variance  into  a  single  statistic.  Taguchi  presents  four  different 
SNRs  to  use  on  three  different  instances.  For  development  of  all  of  the  SNR  equations, 
the  following  quadratic  loss  function  is  used  (where  y  =  f  (x,  z )  ): 

L  =  E_  ( y  —  t)2  (2.1) 

E,  is  the  expectation  operator  on  the  random  variable  z  and  t  represents  the  target 

value  on  the  mean.  The  different  SNR  equations  are  described  in  the  following  section 
and  adapted  from  Myers  &  Montgomery  (2002).  The  first  instance  minimizes  the 
response  and  has  a  quadratic  loss  function,  ^(y-0)2  where  t  is  zero  due  to  minimizing 

the  response  (assuming  the  response  is  nonnegative).  This  loss  function  leads  to  the 
following  equation: 

SWU,=-101ogZ—  (2.2) 

1=1  n 

This  equation  sums  the  squared  errors,  divides  that  number  by  the  number  of 
outer  array  points,  and  then  sends  it  through  a  -10  log  transfonnation.  Due  to  the 
transfonnation,  the  maximum  SNR  is  desired.  Using  the  -10  log  (base  10)  transformation 
allows  the  user  to  maximize  the  SNR  value  despite  whether  the  problem  is  a 
minimization,  maximization,  or  target  value  problem. 

The  second  instance  maximizes  the  response  and  replaces  v,  with  1  /  y.  in 

Equation  (2.2).  This  allows  the  quadratic  loss  function  to  approach  zero  as y  increases. 
The  resulting  SNR  is  given  by  the  equation: 
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(2.3) 


»W«=-101og£  — 

;=1  n 

A  target  mean  value  is  achieved  in  the  third  instance.  Two  scenarios  exist  when 
performing  this  SNR  calculation.  The  first  scenario  involves  the  response  mean  and 
variance  to  be  altered  independently  (Myers  &  Montgomery,  2002:  541).  The  control 
(tuning)  factors  that  have  no  effect  on  variance  are  adjusted  to  obtain  the  mean  and,  thus, 
the  variance  is  not  affected.  Once  completing  this  step,  the  remaining  factors  are  then 
tuned  to  maximize  SNR,  thus,  minimizing  variance  in  the  system.  The  resulting  SNR 
equation  uses  only  a  transformation  of  sample  variance  and  is  given  by: 

^target)  =-10  log*2  (2.4) 

The  second  scenario  exists  when  the  response  variance  is  related  to  the  response 
mean.  The  user  desires  a  linear  relationship,  but  this  may  not  always  be  the  case.  As 
before,  the  control  (tuning)  factors  are  set  and  the  remaining  factors  utilize  a  maximized 
SNR  value  to  obtain  minimum  variance.  The  resulting  SNR  value  is  given  by: 

(2.5) 

Factor  plots  can  visually  assist  in  the  selection  of  the  control  variables.  Along 
with  plotting  the  SNR  values,  the  process  means  can  be  plotted  to  determine  which 
control  settings  are  best  to  use.  When  perfonning  the  analysis,  Taguchi’s  method 
suggests  setting  the  SNR  value  at  its  maximum  first,  and  then  choosing  the  appropriate 
mean  value  setting.  Using  the  two  plots  in  conjunction  with  one  another  should  lead  to  a 
favorable  answer  in  achieving  a  target  mean  while  minimizing  variance. 
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The  use  of  the  SNR  statistic  has  been  widely  criticized  largely  due  to  its  inability 
to  distinguish  between  the  effects  of  the  mean  and  variance  in  the  process.  Another 
criticism  of  the  SNR  usage  is  the  issue  of  units.  The  only  unit-less  equation  is  Equation 
(2.5),  SNRr  .  Finally,  as  discussed  previously,  crossed  array  designs  can  become  very 

large  as  more  controllable  and  noise  factors  are  added  to  the  design. 

2.2.2  Combined  Array  Design 

A  combined  array  design  utilizes  a  smaller  number  of  design  points  than  the 
crossed  array  design  while  continuing  to  capture  the  design  space  of  the  variables 
(Montgomery  &  Myers,  2002).  This  differs  from  crossed  array  designs  because  every 
combination  of  control  variables  does  not  need  to  be  tested  across  every  combination  of 
noise  variables.  Rather,  a  design  is  chosen  that  is  intelligent  in  its  construction  to  test 
different  points  in  the  design  region  which  provide  appropriate  results.  These  types  of 
designs  are  smaller  in  tenns  of  number  of  runs  compared  to  crossed  array  designs. 

Response  surface  methodology  utilizing  quadratic  regression  will  be  used  to  solve 
problems  with  combined  array  designs.  Therefore,  when  choosing  an  experimental 
design  for  combined  array  designs,  one  must  consider  designs  appropriate  for  second- 
order  models  (Myers  &  Montgomery,  2002).  Myers  &  Montgomery  (2002:  304)  state 
that  several  important  properties  are  necessary  in  the  selection  of  designs.  These  four 
properties  are  utilized  when  selecting  designs  in  this  research: 

1 .  Result  in  good  lit  of  the  model  to  the  data 

2.  Give  sufficient  information  for  lack  of  fit 

3.  Provide  an  estimate  of  “pure”  experimental  error 

4.  Be  cost-effective 
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Many  second-order  designs  have  been  established  and  variations  of  these  designs 
are  continually  developed.  This  research  primarily  focuses  on  the  central  composite 
design  (CCD).  First  introduced  by  Box  &  Wilson  (195 1),  a  CCD  with  k  variables  uses  F 
factorial  points,  2k  axial  points,  and  nc  center  runs  (Myers  &  Montgomery,  2002).  The 

factorial  points  allow  for  an  estimation  of  linear  and  interaction  terms,  while  the  axial 
points  estimate  quadratic  interaction  terms  and  the  center  points  account  for  estimate  of 
error  in  quadratic  terms  (test  for  curvature).  A  typical  CCD  design  is  depicted  in  Figure  3 
where  the  axial  distance  is  equal  to  one  (a  =  1 ,  where  a  denotes  the  axial  distance). 


Figure  3.  Central  Composite  Design  with  Axial  Points  =  1 

Axial  points  equal  to  a  value  of  one  are  shown  in  Figure  3.  This  is  a  special  case 
of  CCD  known  as  face-centered.  A  face-centered  cube  is  used  when  the  design  points 
represent  the  absolute  bounds  of  the  variables  settings.  Therefore,  lower/higher  settings 
than  those  representing  absolute  bounds  for  experimentation  cannot  be  used  since 
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unstable  results  would  occur  outside  these  bounds  set  by  alpha.  However,  if  setting 
values  can  be  tested  outside  the  low/high  settings,  then  axial  points  greater  than  one  can 

be  tested.  The  axial  distance  varies  from  1  to  yfk  . 

Other  second-order  designs  exist  for  use  in  RPD.  If  three  levels  are  practical,  a 
3k  factorial  design  can  be  employed.  As  a  special  case,  the  32  factorial  is  indeed  a  face- 
centered  CCD.  Other  designs  consist  of  Box-Behnken,  Equiradial,  D-Optimal,  etc.,  but 
only  CCD  or  full  factorials  are  utilized  in  this  research  due  to  their  efficiency  in 
developing  good  quadratic  fits  and  small  design  size  (with  the  CCD). 

2.2.3  Response  Surface  Methodology 

Response  surface  methodology  has  extended  the  ideas  of  Taguchi  to  be  applicable 
to  RPD.  Using  response  surface  methodology,  an  understanding  of  the  relationship 
between  process  mean  and  variance  becomes  useful  in  choosing  more  appropriate  control 
settings.  Combined  array  designs,  discussed  in  the  previous  section,  are  implemented 
rather  than  crossed  array  designs  due  to  their  small  size  and  efficiency  in  providing 
appropriate  quadratic  regression  fits.  However,  if  the  problem  is  small  and  inexpensive 
to  test,  a  crossed  array  design  is  more  beneficial  due  to  increased  sampling  of  the  process. 

Two  approaches  exist  to  determine  the  mean  and  variance  models;  these  are 
known  as  the  “single  model”  and  the  “dual  model”  approaches.  Both  methods  have  been 
shown  to  be  effective  and  can  be  used  interchangeably  (Robinson  et  al.,  2004).  The  dual 
model  approach  develops  a  mean  model  and  variance  model  separately  based  on 
collected  or  historical  data  (Myers  &  Carter,  1973).  The  single  model  approach  differs 
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from  the  dual  model  in  that  it  develops  an  overall  process  model  from  which  the  mean 
and  variance  models  are  derived. 

In  the  dual  model  approach,  the  mean  and  variance  models  are  created  separately 
from  experimental  testing.  Data  is  collected  on  both  the  mean  and  the  variance  of  a 
particular  response.  By  having  these  two  data  types,  different  models  are  created 
separately  to  detennine  the  mean  and  variance  values  pertaining  to  specific  control 
settings.  Drawbacks  to  this  approach  usually  involve  the  need  for  a  crossed  array  design, 
thus,  a  larger  number  of  runs.  However,  if  the  data  exists,  this  approach  may  prove 
useful  in  attaining  a  more  appropriate  and  exact  variance  model. 

The  single  model  approach  applies  the  response  surface  methodology  technique 
of  developing  an  overall  process  model  based  on  collected  data  and  then  derives  the  mean 
and  variance  models  from  the  process  model.  Typically,  a  second  order  model  is  applied 
to  model  the  interactions  between  control  variables  and  noise  variables,  as  well  as  control 
variables  with  themselves.  Myers  &  Montgomery  (2002)  present  the  equation  for  the 
overall  process  model  as: 

y(x,z )  =  J30  +  x'  J3  +  x'  Bx  +  z'  y  +  x'  Az  +  s  ^ 

The  x’s  represent  the  control  variables  (settings)  and  the  z’s  represent  the  noise 
variables.  The  symbol  /?  is  a  vector  of  coefficients  for  the  control  main  effects  and  y  is 
the  vector  of  coefficients  for  the  noise  main  effects.  The  matrix  of  coefficients  B  is  the 
quadratic  control  effects  and  A  is  the  matrix  of  coefficients  for  the  control  by  noise 
interactions.  Finally,  s  is  distributed  normally  with  a  mean  of  zero  and  variance,  a2 , 
which  is  estimated  by  the  mean-square  error  (MSE).  This  model  is  “broken  down”  to 
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obtain  the  mean  and  variance  response  models.  The  mean  and  variance  equations  are 
presented  below  which  are  adapted  from  Myers  &  Montgomery  (2002:  563): 

Mean  Response  Model:  E^z£)[y(x,z)\  =  j30+x' j3  +  x'Bx  (2.7) 

Variance  Response  Model:  V(_  £)[y(x, z)\  =  cr2(y  +  A  ’x)  '(^  +  A  ’x)  +  cr2  (2.8) 
An  assumption  is  made  in  the  variance  response  model  that  the  variance- 
covariance  matrix  of  the  noise  variables  given  as  cov(z)  =  <j2J  .  This  assumption  allowed 

Myers  &  Montgomery  (2002)  to  derive  the  variance  response  model  in  Equation  (2.8). 

The  mean  response  model  is  directly  extracted  from  the  overall  process  model. 
When  the  control  variables  are  set,  the  same  result  will  be  achieved  on  the  average  for 
that  particular  selection  of  settings. 

The  variance  model  does  not  offer  quite  as  a  direct  interpretation  as  the  mean 
model.  However,  the  model  is  only  in  terms  of  control  variables.  Only  the  coefficients 
of  the  noise  variables  and  their  interactions  are  used.  Although  most  of  the  derivation  is 
fairly  straightforward  matrix  algebra,  two  components  are  needed  to  be  explained  (Myers 
&  Montgomery,  2002).  The  variance  of  the  noise,  a2 ,  is  related  to  the  coded  bounds  of 

the  noise  variables.  Typically  this  value  is  assumed  to  be  1  since  the  tested  bounds  of 
noise  variables  are  between  [-1,1].  However,  this  variance  can  change  based  on  the 
bounds  of  the  noise  variables  thus  attaining  a  value  of  2,  1/2,  3,  2/3,  etc.  This  issue 
requires  further  research  to  determine  an  optimal  setting  for  cr; .  Finally,  a2  is  directly 

taken  as  the  error  for  regression  given  by  the  overall  process  model. 

The  single  model  approach  allows  for  the  use  of  a  combined  array  design  with 
CCDs.  Typically,  for  RPD,  an  optimization  program  is  proposed  in  choosing  the  control 
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settings  that  best  achieve  the  target  mean  while  minimizing  variance.  Myers  & 
Montgomery  (2002)  present  an  optimization  problem  as  suggested  by  Vining  &  Myers 
(1990)  which  is: 

min  V{„)[y(x,z)] 

xeD  v  ’ 

(2.9) 

S-L  E(z,e)\-y(X’Z)]  =  m 

This  concludes  the  development  of  choosing  a  design,  obtaining  a  mean  and 
variance  model,  and  establishing  the  optimization  problem  for  RPD.  All  that  remains  is 
choosing  the  control  settings  that  optimize  Equation  (2.9).  This  problem  involves  solving 
the  mean  and  variance  models  simultaneously  and  therefore  is  a  dual  response  problem. 
Section  2.2.4  addresses  different  methods  proposed  in  solving  the  dual  response  problem 
that  have  been  applied  in  RPD  research. 

2.2.4  Solving  the  Dual  Response  Problem 

Different  versions  of  Equation  (2.9)  are  used  but  the  general  idea  is  given  in  the 
optimization  problem  presented  above  (Tang  &  Xu,  2002;  Robinson  et  al.,  2004;  Shaibu 
&  Cho,  2009).  Myers  &  Montgomery  (2002)  present  a  step-wise  approach  to  solving  the 
optimization  problem  that  is  much  like  the  Taguchi  approach  for  solving  the  dual 
response  problem  with  SNRs  and  mean  outputs. 

First,  all  possible  combinations  of  settings  (at  a  given  step  size)  are  applied  to  the 
models,  obtaining  a  response  of  mean  and  variance  for  each  combination  of  settings.  The 
control  settings  associated  with  the  minimum  variance  value  is  chosen,  assuming  the 
mean  response  satisfies  the  constraint  in  Equation  (2.9).  If  the  constraint  is  not  satisfied, 
the  combination  of  control  settings  with  the  next  lowest  variance  value  is  chosen.  This 
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procedure  is  repeated  until  the  mean  response  constraint  in  Equation  (2.9)  is  satisfied. 

The  resultant  solution  is  a  unique  combination  of  control  variable  settings  that  achieves  a 
target  mean  with  minimum  variance  across  noise  in  the  system. 

Myers  &  Montgomery  (2002)  also  suggest  the  use  of  contour  plots.  However, 
contour  plots  are  limited  to  problems  consisting  of  two  control  variables.  The  mean 
contour  plot  for  the  two  control  variables  is  overlaid  with  the  variance  contour  plot.  This 
method  visually  displays  the  optimal  solution  if  the  two  are  in  the  same  region. 

However,  if  the  optimal  solution  is  not  the  same  region,  once  can  visually  assess  the 
tradeoffs  of  mean  or  variance  by  searching  the  control  variable  space. 

Myers  &  Carter  (1973)  as  well  as  Vining  &  Myers  (1990)  suggest  optimizing  the 
primary  response  subject  to  the  secondary  response  (the  constraint)  through  the  use  of 
Lagrangian  multipliers.  Although  the  notation  is  slightly  different,  the  two  models  (mean 
and  variance)  are  estimated  as: 


uy  =  P0+x'  [3  +  x'  Px 
ay  =  Yo  +  x '  y  +  x '  Cx 


(2.10) 


A  constraint  is  added,  x'x  =  p1 ,  to  restrict  the  possible  search  area  of  the  optimal 
settings  to  a  sphere  (where  p  represents  the  radius  of  the  spherical  region).  The 
Lagrangian  multipliers  are  then  utilized  by  associating^  with  the  mean  or  variance 
model  and  associating  with  the  above  constraint  (spherical  region).  Robinson  et  al. 

(2004)  provide  an  example  of  solving  this  optimization  problem  by  minimizing  variance 
and  keeping  the  mean  response  on  a  target  of  500.  Equation  (2. 1 1)  is  minimized  over  all 
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possible  combinations  o  t'X0  andXp  to  provide  x,  which  is  the  optimal  set  of  operating 
conditions: 

L  =  <j2-  Xe  (uy-50Qi)-Xp{x'x-p2)  (2.11) 

Del  Castillo  &  Montgomery  (1993)  took  Equation  (2. 1 1)  and  applied  the 
Generalized  Reduced  Gradient  (GRG)  algorithm.  GRG  is  utilized  because  the 
Lagrangian  Multiplier  method  from  Vining  &  Myers  (1990)  may  not  always  produce  a 
local  optima  since  only  equality  constraints  are  used.  GRG  allows  for  the  use  of 
inequality  constraints  more  suitable  for  nonlinear  problems.  These  authors  display  the 
effectiveness  of  this  algorithm  against  the  cases  of  maximizing,  minimizing,  or  even 
achieving  a  target  value.  This  method  is  often  preferred  due  to  its  built  in  implementation 
in  common  software  such  as  the  Microsoft  Excel  add-in  Solver. 

Lin  &  Tu  (1995)  suggest  the  incorporation  of  bias  in  the  primary  response  in 
order  to  avoid  forcing  the  estimated  mean  response  to  a  particular  value.  These  authors 
propose  minimizing  the  mean  squared  error  in  the  optimization  problem,  Equation  (2.9). 
The  three  instances  that  include  minimizing,  maximizing,  and  target  value  are  given 
below  respectively  (Lin  &  Tu,  1995;  Koksoy,  2008).  In  all  three  cases,  the  MSE  value  is 
minimized  to  provide  the  solution. 

MSEmn  =  {wrLP(x,  z)] ! 2  +  d2[y(x,  z)] 

MSEm ax  =  ~{u:[y(x,z)]}2  +a:[y(x,z)]  (2.12) 

MSEt,rSet  =  {uz[y(.x,z)\-Tf  +a:[y(x,z)\ 

Shaibu  &  Cho  (2009)  extended  the  ideas  of  Lin  &  Tu  (1995)  to  incorporate  target 
variances  into  the  three  models.  Lin  &  Tu  assume  that  a  variance  of  0  is  best  in  all  three 
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situations  as  is  in  most  situations  of  experimental  designs  and  systems.  Shaibu  &  Cho 
(2009)  suggest  setting  a  variance  value  ( S)  that  the  user  can  live  with  and  this  target  value 
(  Ts )  is  added  to  the  variance  portion  of  all  three  MSE  models  given  above.  These  MSE 
equations  are: 


(2.13) 


MSEm,n=u(x)  +  (a(x)-T,f 
MSEm x  =  -[u(.x)  +  (<T(x)-T,f 

MSE w  =(u(x)-T)2  +(&(x)-T,y 
Copeland  &  Nelson  (1996)  point  out  that  the  formulation  given  by  Lin  &  Tu  place 
no  restriction  on  the  estimated  mean  response.  Thus,  if  the  estimated  mean  response 
values  are  large  and  the  variances  are  small,  the  MSE  indications  could  make  the 
suggested  solution  far  from  the  minimum  variance  value.  These  authors  suggest  placing 
a  restriction  on  the  MSE  search  (cr2(x)  +  s)  with  the  following  constraint  (given  only  for 
the  mean  target  value  but  applies  to  minimization/maximization  as  well): 


I  {Uy-Tf  .f  (Uy-TY>A- 
'  10  1  (u-T)2<  A2 


(2.14) 


Tang  &  Xu  (2002)  incorporated  all  these  ideas  and  those  of  goal  programming  to 
derive  their  own  all  encompassing  approach  to  the  dual  response  problem.  Their 
objective  function  and  constraints  take  on  the  fonn: 

min  8, 2  +  Si 


s.t.  uy  -  wuSu  =  Tu 
<j2-wS=T* 

y  <7  <7  (7 

and  x'x<p 1  or  x,<x<xu 


(2.15) 
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The  weights  (w)  are  user  defined  and  if  a  rectangular  region  is  used  rather  than  a 
spherical  region,  the  x’x  constraint  is  swapped  out.  T*  and  T*  represent  the  desired  target 

mean  and  variance  values  for  u  and  oy  respectively.  The  terms  in  the  objective 
function,  8^  and  S*  represent  unrestricted  scalar  variables.  These  terms  multiplied  by 

the  weights  introduce  slackness.  One  can  obtain  the  target  values  exactly  (set  equal  to 
zero),  over  shoot  (>  1)  or  under  shoot  (<  1). 

This  new  optimization  problem  incorporates  the  techniques  of  Vining  &  Myers 
(1990),  Del  Castillo  &  Montgomery  (1993),  Lin  &  Tu  (1995),  and  Copeland  &  Nelson 
(1996)  as  special  cases  depending  on  the  weights  of  the  constraints.  For  instance,  setting 
wu  and  vvy  equal  to  1  and  T*  equal  to  0,  yields  the  following  objective  function: 


This  objective  function  is  the  same  objective  function  derived  from  Lin  &  Tu, 
Equation  (2.12).  The  other  methods  can  also  be  obtained  through  specific  weights  and 
target  values.  This  formulation  allows  one  to  encompass  any  established  method  of 
solving  the  dual  response  problem,  or  perform  goal  programming  to  determine  a  new  set 
of  weights,  thus  finding  a  new  method  of  solving  this  problem. 

The  different  approaches  taken  to  solve  the  dual  response  problem  for  RPD  are 
summarized  in  Table  1.  Work  in  this  document  will  focus  primarily  on  the  Lin  &  Tu 
(1995)  approach  to  solving  this  dual  response  model.  However,  the  choice  of  approach  is 
dependent  on  the  application  and,  as  such;  the  Lin  &  Tu  approach  is  not  necessarily  the 
optimal  approach. 
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Table  1.  Current  Methodologies  to  Solve  Dual  Response  Problem  in  RPD 


Author 

Method 

Objective  Function 

Constraints 

Comments 

Myers  & 
Montgomery 
(MM) 

Minimize  V  ariance, 
Choose  Mean 
or  Contour  Plots 

min  E(sz)[y(x,z)\ 

X 

Eie,z)[y(x,z)]  =  m 

-  No  tradeoff  considered 

-  Contour  Plots  limited  to  2 
variables 

Vining  &  Myers 
(VM) 

Lagrangian 

Multipliers 

L  =  (J2y  -Ad(uy-T)-Ap(x'x-p2) 

x’x  =  p1 

Difficult  calculations 

Del  Castillo  & 
Montgomery 
(DM) 

Generalized 
Reduced  Gradient 

L  =  ~  K  (My  ~  T)  ~  (x '  *  -  P2 ) 

x'x<  p1 

Inequality  Constraints 

Difficult  calculations 

Lin  &  Tu 
(LT) 

Mean  Squared 
Error  (MSE) 

MSEmm  =  {u:\y(x,z)]}2  +d;[y(x,z) 

mse max  =  ~{uz[y(x,  z)]}2  +  <j;[y(x,  z)] 

MSE^g*  =  {uz[y(x,z)]-T}2  +al[y{x,z)\ 

none 

-  Uses  tradeoffs 

-  No  restriction  on  Uz 

Shaibu  &  Cho 
(SC) 

MSE  with  variance 
target  value 

MSEnAn  =  +  (E(X)  ~  T  )2 

MSE™,  =  ~  u(x)  +  (o-(x)  - Ts  )2 

MSE^  =  (u(x)  -  T)2  +  (a(x)  -  T  f 

<y(x)  *  ts 

-  Utilizes  target  value  from 
variance 

Copeland  &  Nelson 
(CN) 

LT  MSE  with 
search  restriction 

MSEmm  =  {uz[y(x,z)]}2  +az[y(x,z)] 
MSEmm  =  ~{uz[y(x,  z)]}2  +  o-z[y(x,  z)] 

MSE^  =  {uz[y(x,z)\-T}2  +al[y{x,z)} 

,f  (uz  - T)2  >  A2 
[0  1  {uz-Tf<  A2 

-  Reduces  distance  Uz  can 
move  from  target  value 

Tang  &  Xu 
(TX) 

Quadratic 

Optimization 

Problem 

min  dl+8; 

X 

uy~wu8u  =  T* 
&2-w8=T* 

y  (7  <7  <7 

x'x  <  p1  or  x,  <  x  <  xu 

-  Encompasses  all  above 
methods  through  “special 
cases” 

-  Weighting  can  be 
subjective 
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The  settings  found  to  be  optimal  based  on  the  dual  response  model  should  not 
only  achieve  a  desired  mean  output  of  the  system,  but  should  also  be  robust  to  any  noise 
properly  modeled  in  the  system.  The  following  section  briefly  discusses  “robustness.” 

2.2.5  Robustness 

A  robust  design  is  a  design  that  implements  a  particular  set  of  settings  that 
provide  good  mean  perfonnance  and  is  insensitive  to  uncontrollable  sources  or  variables 
that  cause  variation  (Sanchez,  1994).  The  key  word  in  the  previous  definition  is  “good.” 
Determining  good  solutions  is  difficult  due  to  personal  bias  and  understanding  of  the 
process  or  system.  “Robust”  will  be  used  in  this  document  to  mean  control  variable 
settings  that  are  insensitive  to  noise  in  the  system.  This  means  that  the  variance  of  the 
process  is  relatively  low  across  the  noise  space  under  the  “robust”  settings  and  that  the 
mean  is  close  to  its  target  value. 

Each  method  in  Table  1  presents  a  solution  that  is  the  most  robust,  according  to 
the  method’s  formulation.  However,  although  the  settings  are  robust,  they  may  provide 
weaker  mean  (and  variance)  responses  than  desired  (unless  explicitly  expressed  as  a 
target  mean  or  variance).  Most  of  the  methods  “search”  the  solution  space  to  provide  a 
tradeoff  between  increases  in  expected  mean  (assuming  maximizing)  while  maintaining 
little  change  in  variance.  However,  the  further  one  moves  away  from  the  minimum 
variance  value,  the  less  robust  the  solution  set  becomes.  The  goal  of  RPD  is  not  to 
necessarily  provide  the  best  mean  for  the  given  situation,  but  rather  to  provide  a 
consistent  mean  for  future  implementation  of  the  process  or  system  with  the  existence  of 
uncontrollable  noise  variables  (Myers  &  Montgomery,  2002). 
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2.3  Artificial  Neural  Networks 

Methodology  has  been  discussed  on  how  to  approach  an  RPD  problem,  determine 
optimal  solutions,  and  determine  robustness  using  quadratic  regression.  The  success  of 
this  analysis  is  based  on  how  well  one  is  able  to  fit  training  data  with  a  regression  model 
(achieving  significance,  denying  lack  of  fit,  obtaining  high  r-squared  values,  etc.). 

Neural  networks  can  be  used  to  fit  the  regression  models,  rather  than  traditional 
linear/quadratic  techniques,  allowing  for  a  more  nonlinear  and  hopefully  a  better  fit. 

Artificial  neural  networks  (ANNs)  were  established  with  the  notion  that  the 
human  brain  could  be  mimicked  by  an  engineering  design  (Kuncheva,  2004).  These 
ANNs  resemble  the  biological  cognitive  systems  with  their  ability  to  “learn”  data  and 
patterns  through  the  use  of  supervised  training  for  parameter  adjustment  in  the  model. 
Many  types  of  ANNs  are  employed  in  practice  today,  each  with  different  learning  rules 
and  differences  in  the  calculation  of  outputs  for  each  specific  neural  network. 

2.3.1  ANN  Classification  and  Regression 

ANNs  are  used  for  either  classification  purposes  or  regression  analysis. 
Loeffelholz  et  al.  (2009)  successfully  demonstrated  the  use  of  four  different  ANNs  to 
classify  a  winner  in  an  NBA  basketball  game  based  simply  on  box  score  data.  The 
results  obtained  from  these  authors  showed  remarkable  improvement  in  accuracy  over  the 
“experts”  in  the  field  of  basketball  while  using  the  simplest  form  of  data  collected  in  the 
sport.  In  classification,  ANNs  can  be  superior  to  techniques  such  as  discriminant 
analysis,  factor  analysis,  or  principal  component  analysis  due  to  their  ability  for  non¬ 
linear  fits. 
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In  this  research,  rather  than  determining  a  two  or  three  class  output  system,  ANNs 
are  used  to  fit  regression  type  models.  Myers  &  Montgomery  (2002)  claim  that  quadratic 
regression  will  fit  most  real  world  application  problems  but  when  this  is  not  true,  an 
alternative  formulation  is  necessary  to  model  those  problems  not  captured  well  using 
quadratic  regression.  For  instance,  Davis  (2009)  applied  RPD  to  a  hyper-spectral 
imagery  problem  consisting  of  four  outputs.  For  two  of  the  four  outputs,  very  low 
R2  values  and  significant  lack  of  fit  was  present  through  the  use  of  quadratic  regression. 
This  result  warrants  the  use  of  an  alternate  method,  such  as  ANNs,  to  properly  fit  data. 

As  noted  previously,  different  neural  networks  have  been  developed  to  model 
difficult  problems.  The  neural  networks  chosen  for  this  research  are  Radial  Basis 
Function  Neural  Networks  (RBFNNs)  and  Generalized  Regression  Neural  Networks 
(GRNNs).  These  networks  were  chosen  based  on  their  applicability  to  regression 
analysis  as  well  as  a  positive  personal  experience  of  the  author  through  applying  these 
ANNs  to  real  world  problems. 

2.3.2  Radial  Basis  Function  Neural  Networks 

To  begin  the  discussion  on  RBFNNs,  the  architecture  will  be  presented  followed 
by  the  underlying  mathematics  behind  RBFNNs.  The  RBFNN  is  constructed  using  a 
layer  of  input  nodes,  a  single  hidden  layer,  and  an  output  layer.  The  input  layer  is  related 
to  the  number  of  features,  or  in  the  case  of  regression,  the  number  of  (functional) 
independent  variables.  The  number  of  nodes  in  the  hidden  layer  is  equal  to  the  number  of 
training  exemplars  in  the  input  layer.  For  example,  if  a  CCD  consisting  of  23  runs  where 
15  runs  were  allocated  to  the  training  set  and  the  remaining  eight  were  withheld  for 
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testing,  the  hidden  layer  of  the  RBFNN  would  consist  of  15  hidden  nodes.  Finally,  the 
output  layer  is  related  to  the  number  of  outputs  in  the  system.  In  this  research  of  using 
RBFNN s  for  regression,  this  output  layer  will  consist  of  a  single  node  if  examining  a 
single  response  problem.  Depending  on  the  fonnulation  of  the  RBFNN  output,  two  bias 
layers  can  also  exist.  An  initial  bias  layer  can  be  applied  on  the  input  and  another  bias 
layer  on  the  output. 

A  RBFNN  (biases  not  shown)  is  depicted  in  Figure  4.  Xj  represents  the  input 

features  or  the  variables  for  regression.  Every  exemplar  in  the  input  layer  is  passed  to 
each  node  in  the  hidden  layer.  Each  hidden  layer  node  contains  a  basis  function 
(A,  )  which  is  weighted  ( vv; )  to  the  output  node.  The  output  layer  sums  (Z)  all  the 

weighted  hidden  layer  values  to  obtain  the  output  value,  y,  for  each  combination  of  input 
variables. 


Inputs 


Hidden 

Layer 


Summation  Outputs 
Layer 


Figure  4.  RBFNN  with  Single  Output 
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For  notational  purposes,  x  represents  the  exemplar  sent  through  the  network,  //;  is 
the  zth  center,  and  cr  represents  the  zth  spread.  To  explain  this  network  in  simple  terms, 

the  classification  application  is  first  discussed.  The  exemplar  is  sent  through  the  network 
and  its  distance  from  the  centers  (or  trained  exemplars)  is  calculated  and  those  firing 
closest  to  a  particular  center  score  a  value  closest  to  that  used  in  training.  For  instance,  if 
two  centers  with  input  values  of  [1  2]  and  [4  5]  representing  class  1  and  class  2, 
respectively,  are  trained  in  an  RBFNN,  a  new  exemplar  with  input  values  of  [1  2]  is  most 
likely  to  fire  closest  to  center  [1  2]  and  be  labeled  a  class  1  node. 

In  terms  of  regression,  the  same  principle  is  applied.  Each  new  input  value  is 
measured  against  all  trained  values  (hidden  nodes)  to  determine  their  “distance”  from 
each  node.  The  new  input  value  is  then  assigned  an  output  value  closely  resembling  the 
output  value  for  the  hidden  node  with  the  closest  activation.  Bias  and  weights  are 
incorporated  to  allow  the  output  values  to  vary  around  the  actual  output  value  of  the 
hidden  node,  thus  leading  to  a  function  rather  than  discrete  points  as  used  in  classification 
purposes. 

Training  in  an  RBFNN  is  simpler  and  quicker  than  other  networks  (such  as  Feed- 
Forward).  To  train  the  network,  each  exemplar  is  fed  through  the  hidden  nodes,  one  at  a 
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time,  obtaining  an  output.  A  hidden  weight  is  obtained  and  the  overall  weight  of  the 
network  is  adjusted.  This  is  done  by  the  following  equation: 

Wi  («  +  !)  =  Wj  (n)  +  rj(t  —  y)z.  (2.18) 


wi  represents  the  weights  of  the  networks,  n  is  the  iteration  number,  //  is  the  step 
size,  t  acts  as  the  target  value,  y  represents  the  network  output,  and  finally  z.  =  h;  (x) . 

Once  again,  each  exemplar  is  sent  through  the  network  and  the  weights  are  updated.  This 
process  is  continued  until  an  appropriate  total  error  is  reached,  thus  indicating  a  well 
trained  RBFNN.  Since  only  a  linear  output  layer  is  used  outside  the  hidden  layer, 
Wasserman  (1993)  notes  that  the  RBF  is  guaranteed  to  converge  to  a  global  minimum  (as 
compared  to  other  networks  that  can  be  trapped  in  local  minimums)  but  the  network  can 
be  extremely  large  dependent  on  the  number  of  training  exemplars  used. 

Once  a  RBFNN  is  trained,  new  exemplars  can  be  processed  through  the  network 
to  obtain  an  appropriate  output  value.  In  the  literature,  Duda  et  al.  (2001),  Looney 
(1997),  and  Wasserman  (1993)  present  the  following  equation  to  be  used  in  calculating 
an  output  value  for  a  given  input: 


z{,)  =^Wj  exp 
j= i 


2a;  tt 


1(4"- A’ )! 


(2.19) 


For  exemplar  i,  this  formula  assumes  that  p  centers  exist  for  n  features  (or  input 


variables).  x[!]  represents  the  Mi  input  feature/variable  of  the  new  exemplar  and 


/47  )  represents  the  Mi  component  of  the yth  center.  The  distance  of  the  new  exemplar 
from  the  center  of  all  trained  exemplars  is  calculated.  Here  a  represents  the  spread  and 
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Wj  is  equal  to  the  weight  of  the  /th  node.  In  regression  terms,  this  will  provide  the 

expected  output  value  for  any  given  input  variable  terms. 

In  this  research,  MATLAB®  is  employed  by  making  calculations  easier  for  large 
scale  problems.  MATLAB®  uses  a  slightly  different  fonnulation  for  determining  the 
output  value  of  a  RBFNN.  The  formulation  used  is  represented  as: 

p  f  n  \ 


z<0  =  Z wj  exP  $ (4°  - /4y>)) 


+b 2 


(2.20) 


7=1 


V  k= 1 


In  this  equation,  the  initial  bias  tenn  (bj) represents  MATLAB ’s®  interpretation 


of  applying  the  spread  in  the  equation.  The  bias  term  is  calculated  as  .8326/spread  rather 
than  using  one  half  the  squared  values  as  in  Equation  (2.19).  Also,  a  second  bias  term  is 
added  on  the  end  of  the  equation  to  represent  a  linear  layer  bias  term.  This  simply  shifts 
the  output  value  up  or  down  by  the  specified  amount. 

For  a  more  in-depth  examination  of  RBFNNs,  Duda  et  al.  (2001),  Looney  (1997), 
and  Wassennan  (1993)  are  appropriate  texts.  These  authors  also  provide  deeper  insight 
into  the  origination  of  RBFNNs,  a  deeper  understanding  of  the  training  approach,  as  well 
as  other  details  of  RBFNNs. 


2.3.3  Generalized  Regression  Neural  Networks 

Generalized  Regression  Neural  Networks  (GRNNs)  belong  in  the  same  class  as 
the  RBFNNs  and  are  useful  in  terms  of  non-linear  regression  (Wassennan,  1993).  The 
architecture  is  similar  but  calculations  differ  in  terms  of  training.  More  specifically,  there 
exists  no  training  in  the  GRNN.  Response  values  are  directly  calculated  from  the 
network  with  the  weights  directly  related  to  the  response  values.  GRNNs  are  extremely 
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useful  compared  to  other  neural  networks  due  to  their  ability  to  converge  on  a  function 
with  little  training  data. 

Specht  (1991)  applied  the  ideas  of  the  Normal  Distribution  to  create  the 
formulation  of  the  GRNN.  The  formula,  where  Df  =  (X  -  X'  )T  (X  -  X' ) ,  is  given  as: 


The  scalar  function,  D] ,  calculates  the  distance  from  the  new  exemplar  (X)  from 


the  centers  of  all  the  “training”  exemplars  ( X 1 ).  The  spread  value,  cr2 ,  remains  defined 


as  the  same  for  RBFNNs.  Finally,  Y‘  represents  the  weights  in  the  network  which  is 
extracted  from  the  outputs  of  the  “training”  exemplars.  This  feature  distinguishes  the 
GRNN  from  the  RBFNN  in  that  the  weights  do  not  need  to  be  calculated  or  updated  in 
the  GRNN. 


To  remain  consistent  with  notation  in  the  RBFNN,  the  GRNN  formula  can  be  re¬ 
written  as: 


Yjwj  exp  -Y(b)(xk -u{kj)))2 

7=1 _ V  *= 1 _ 

2>P 

7=1  V  *=1 


(2.22) 


The  features/variables  of  the  new  test  point  are  represented  by  x  with  its  output  as 
z.  Weights  for  each  “training”  exemplar  are  represented  as  w  and  are  taken  from  the 
response  for  each  feature  setting.  The  centers  of  the  “training”  exemplars  are  Ju<kj)  and  all 


remaining  terms  were  defined  in  RBFNNs.  The  GRNN  based  on  Equation  (2.22)  is 
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depicted  in  Figure  5.  Originally  derived  from  Specht  (1991),  this  figure  was  modified. 
a  represents  the  numerator  while  / 3  represents  the  denominator  of  Equation  (2.22). 


Inputs 


Hidden 

Layer 


Summation  Division 

Layer  Layer 


Outputs 


Figure  5.  Generalized  Regression  Neural  Network 

This  regression  allows  for  direct  application  into  problems  involving  numerical 
data  (Specht,  1991).  GRNNs  calculate  quickly  since  weights  do  not  have  to  be  calculated 
and  updated  separately.  For  more  information  on  the  construction  of  GRNNs,  the  reader 
is  referred  to  Wasserman  (1997)  or  Specht  (1991). 


2.4  Multiple  Responses 

Many  real-world  application  problems  involve  the  use  of  multiple  responses 
rather  than  a  single  response.  Often  the  control  variables  will  adjust  more  than  one 
response  differently  with  the  same  chosen  settings  for  those  variables.  Traditional 
research  (Myers  &  Montgomery,  2002;  Robinson  et  al.,  2004)  focuses  on  single  response 
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problems  in  RPD.  Several  approaches  have  been  taken  to  solve  multiple  response  RPD 
problems  and  are  presented  in  the  following  sections. 

2.4.1  Weighting  the  Responses 

Perhaps  the  simplest  method  of  solving  the  multiple  response  problems  is  to 
assign  weights  to  each  of  the  response  values  to  form  a  linear  combination  (Koksoy, 

ill 

2008),  i.e.,  =  where  wt  is  the  weight  of  the  ith  response  of  m  responses.  Now 

i=i 

optimization  is  performed  relative  to  y  .  One  difficulty  with  this  technique  is  selecting 
appropriate  values  for  the  weights.  The  author  indicates  that  expert  opinion  is  usually 
elicited  for  the  weights.  Such  methods  could  lead  to  ambiguity. 

Combating  this  problem,  Decision  Analysis  (Kirkwood,  1997)  can  be  applied 
through  the  use  of  value  models.  Once  again,  an  expert  must  be  utilized,  but 
mathematical  procedures  are  used  to  obtain  appropriate  weights  for  each  of  the  response 
values.  Also,  this  approach  allows  the  user  to  change  weights  and  see  the  changes 
instantly. 

Kuhnt  &  Erdbrugge  (2004)  extend  the  idea  of  using  weights  by  applying  a  loss 
function.  These  authors  apply  the  loss  function  to  multi-response  problems  to  minimize 
the  overall  expected  loss  when  applying  different  combinations  of  weights  to  all  the 
responses.  Graphs  are  utilized  to  show  the  expected  loss  values  at  different  settings  of 
the  weights. 
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2.4.2  Visualizing  the  Responses 

For  a  small  number  of  responses,  contour  plots  can  be  generated  for  the  response 
functions  (Lind  et  al.,  1960).  Overlaying  the  contour  plots  of  each  of  the  responses 
highlights  areas  in  which  the  settings  of  the  control  variables  prove  optimal.  This 
approach  becomes  increasingly  difficult  when  more  responses  are  added  to  the  problem. 
Also,  another  downside  may  be  the  lack  of  good  areas  found  if  the  responses  differ  quite 
dramatically  across  the  setting  space  of  the  control  variables. 

2.4.3  Desirability  Functions 

Desirability  functions  can  be  used  to  determine  the  optimal  settings  for  the  control 
variables.  This  approach  is  similar  to  weighting  the  responses.  Derringer  &  Suich  (1980) 
first  proposed  the  idea  of  desirability  functions  by  converting  each  response  into  its  own 
desirability  function  that  covers  the  range  of  zero  to  one.  A  value  of  one  represents  the 
response  achieving  its  goal  while  a  value  of  zero  indicates  the  response  is  outside  the 
specified  acceptable  region  of  interest.  The  scores  for  each  response  are  multiplied 
together  and  taken  to  the  mth  root,  where  m  represents  the  number  of  responses.  This  is 
represented  as: 

D  =  ”Jd]d2---dm  (2.23) 

The  desirability  function  used  can  vary  (Zandieh  et  al.,  2009;  Chang,  2006; 
Derringer  &  Suich,  1980;  Harrington,  1965)  depending  on  the  user’s  preference.  One 
such  function  is  the  exponential  function.  Myers  &  Montgomery  (2002)  adapt  the 
functions  presented  by  Derringer  &  Suich  (1980)  creating  three  desirability  functions 
based  on  maximizing,  minimizing,  or  achieving  a  middle  ground  value.  T  represents  the 
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target  value,  L  and  U  are  the  lower  and  upper  limits  of  the  responses,  and  r  represents 
how  important  achieving  the  target  value  is.  A  value  of  r  =  1  causes  the  function  to  be 
linear,  r>  1  places  a  larger  emphasis  on  achieving  the  target  values  while  0  <  r  <  1  puts 
less  emphasis  on  achieving  the  target  value.  The  equations  presented  represent  the 
desirability  functions  for  when  one  maximizes,  minimizes,  or  finds  some  target  value 
respectively: 


4ma x)  ^ 


0 

'y-L' 

\T-l) 

1 


y  <l 
,L<y<T 
y>T 


(2.24) 


^(tnin)  ^ 


1  y<T 

'u-y  Y 
[u-tJ 
o 


,T<y<U 
y  >u 


^ (target)  ^ 


T-L) 

\ 


u-y 

kU-Tj 

1 


y  <  L 
L<y<T 

> 

T<y<U 

y>T 


(2.25) 


(2.26) 


Zandieh  et  dl.  (2009)  utilized  Equations  (2.24)-(2.26)  to  construct  their 
optimization  problem  including  a  constraint.  Zandieh  et  al.  presented  their  optimization 
problem,  for  k  responses,  and  obtained  the  following  mathematical  model: 

max  D  =  $ldl(yl)xd2(y2)x...xdk(yk) 
x  (2.27) 


s.t.  L(xh)<xh<U(xh) 
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These  authors  suggest  optimizing  Equation  (2.27)  through  the  use  of  search 
algorithms  such  as  Genetic  Algorithms,  Tabu  Search,  and  Simulated  Annealing.  The 
authors  found  Simulated  Annealing  to  perform  the  best  in  terms  of  multiple  response 
RPD  problems. 

Chang  (2006,  2008)  follows  a  similar  approach  to  the  desirability  functions  above 
by  implementing  the  use  of  a  back-propagation  network  (BPN).  Chang’s  procedure 
consisted  of  modeling  the  different  response  models  through  a  BPN,  evaluating  the 
chosen  design  space  of  levels,  and  using  exponential  desirability  functions  and  Simulated 
Annealing  to  determine  settings  for  the  parameters. 

2.4.4  MSE  and PCA 

Koksoy  (2008)  utilizes  the  Lin  &  Tu  (1995)  methodology  (previously  described 
in  Section  2.2.4)  which  involves  the  use  of  mean  squared  error  to  combine  the  mean  and 
variance  models.  However,  Koksoy  assumes  that  multiple  responses  exist;  he  proposes 
the  following  optimization  problem  for  i  =  1,2,..., r  responses: 

min  MSE(x) 

xeR  U>*i) 

(2.28) 

s.t.  MSE(x)i  =  MSE(x)i0 

Previously,  the  region  (R)  was  defined  as  x'x  =  p2 ,  which  applies  here  as  well  if 
a  spherical  region  is  used.  This  method  optimizes  the  appropriate  objective  function, 
MSE(x)j  (.  .} ,  while  setting  the  values  for  the  remaining  MSE  functions  (MSE(x)i0  j . 

Koksoy  presents  a  two  response  problem  and  solves  this  by  incrementing  MSE(x)1Q  at 
fixed  iterations  and  optimizing  MSE(x)l  at  each  iteration.  This  approach  leads  to  a  table 
of  alternative  solutions,  or  a  portfolio  of  solutions,  allowing  the  user  to  examine  tradeoffs 
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of  choosing  one  pair  of  settings  over  another  in  tenns  of  changes  in  the  different  MSE 
values.  Koksoy  (2008)  solves  this  new  optimization  problem  through  the  use  of  the 
Generalized  Reduced  Gradient  (GRG)  method  developed  in  nonlinear  programming. 

Su  &  Tong  (1997)  apply  Principal  Component  Analysis  (PCA)  to  examine 
correlation  among  responses  and  utilize  the  component  scores  rather  than  raw  responses. 
These  authors  apply  PCA  to  the  crossed  array  design  approach  as  suggested  by  Taguchi 
(1986).  The  raw  responses  are  transformed  into  principal  component  scores  and  those 
scores  kept  are  based  on  eigenvalues  scoring  higher  than  one.  Factor  plots  are  used  on 
the  component  scores  (much  like  the  factor  plots  in  SNR)  to  determine  optimal  settings 
for  each  control  factor  in  the  experiment.  This  method  proved  to  reduce  the 
dimensionality  of  the  problem  and  decrease  the  impact  of  its  complexity. 

Ideas  from  these  two  methods  will  be  applied  in  Chapter  3  and  4  to  formulate  a 
new  way  of  examining  multiple  response  problems.  First,  in  a  similar  fashion  to  Koksoy 
(2008),  the  Fin  &  Tu  (1995)  methodology  will  be  applied  to  various  single  and  multiple 
response  problems  in  RPD.  Also,  much  like  the  PCA  approach  of  Su  &  Tong  (1997), 
Factor  analysis  will  be  implemented  and  used  in  a  combined  array  design  rather  than  the 
Taguchi  method  of  SNRs  and  crossed  array  designs.  The  next  section  details  factor 
analysis  and  how  it  is  applied  in  this  research. 

2.5  Factor  Analysis 

Factor  analysis  is  a  data  reduction  technique  that  attempts  to  discover  underlying 
factors  that  link  two  or  more  variables  with  one  another  (Dillon  &  Goldstein,  1984).  This 
analysis  takes  seemingly  unrelated  variables  and  finds  some  linear  combination  to 
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combine  them,  thus  determining  commonalities  between  factors.  In  classification 
problems,  factor  analysis  helps  group  similar  classes  with  one  another  allowing  for  easier 
differentiation  between  classes.  Factor  analysis  allows  one  to  work  in  a  much  smaller 
dimensioned  set  by  working  with  factor  scores  as  opposed  to  raw  data  sets. 

2.5.1  Factor  Analysis  and  PCA 

Factor  analysis  and  principal  component  analysis  (PCA)  are  similar  with  one  key 
difference.  That  key  difference  is  the  explanation  of  the  variance.  PCA  assumes  that  the 
total  variance  of  the  variables  is  included  in  the  components  which  allows  for  no  error 
variance.  With  factor  analysis,  an  error  variance  is  assumed  since  the  commonalities  are 
estimated.  Factor  analysis  looks  for  common  or  shared  variation  rather  than  attempting 
to  account  for  all  the  total  variation  (Dillon  &  Goldstein,  1984). 


2.5.2  Mathematical  Model  of  Factor  Analysis 

To  mathematically  represent  factor  analysis,  the  following  algebraic 
representation  is  used: 


^2 


—  Vl(l)^(l)  +  V1(2)FF(2)  +"-  +  Vl  (m)FF(m)  +  e\ 

=  V2(\)FF(l)  +  V2(2)FF(2)  +  •••  +  V2  (m)CF(m)  +  G 


XP  -VP(1)CFW+VP(2)CF(2)  +-  +  Vp(n ,)CF(m)  +  eP 


(2.29) 


This  representation  assumes  that  one  has  m  common  factors  on  p  variables.  The 
number  of  common  factors  must  be  less  than  the  number  of  variables  (m  <  p). 
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2.5.3  Extracting  Factors 


Several  methods  exist  for  extracting  factors  from  the  data  set.  Dillon  &  Goldstein 
(1984)  summarize  several  of  the  given  methods  which  are  presented  in  the  Table  2.  The 
primary  difference  between  the  methods  is  that  different  factor  solutions  are  obtained 
based  on  the  method  used.  Therefore,  the  authors  describe  that  certain  methods  should  be 
utilized  based  on  the  sample  size,  number  of  variables,  and  variation  among  variables. 

Table  2.  Types  of  Factor  Extraction  Methods 

Principal  Components 

Principal  Factor 
Minimum  Residual 
Image 
Alpha 

Maximum  Likelihood 
Canonical  Maximum  Likelihood 


The  research  presented  in  this  document  will  use  the  principal  components 
method  for  obtaining  factors  and  calculating  factor  scores.  This  method  maximizes  the 
variance  accounted  for,  which  is  set  by  the  user.  The  eigenvalues  are  used  to  explain  the 
amount  of  total  variation  by  each  factor. 

2.5.4  Factor  Rotation  and  Factor  Scores 

Along  with  factor  analysis,  a  varimax  rotation  is  often  applied  to  the  data  set  to 
obtain  a  simpler  structure  for  the  factors.  Thurstone  (1947)  presented  the  idea  of  factor 
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rotation  to  allow  for  easier  interpretation,  particularly  graphically.  The  varimax  rotation 
takes  the  variance  of  the  squared  factor  loadings  and  attempts  to  maximize  their  sum. 

The  primary  use  of  factor  rotation  is  simple  interpretability  (Dillon  &  Goldstein,  1984; 
Anderson,  2003). 

Factor  (rotated)  scores  are  estimated  and  represent  the  location  of  the  observation 
in  terms  of  the  space  projected  by  the  factors  (Dillon  &  Goldstein,  1984).  These  scores 
allow  for  a  graphical  representation  of  how  the  different  observations  lie  on  a  coordinate 
axis,  allowing  one  to  find  clusters  or  links  between  observations  visually.  Once  the 
scores  are  estimated,  they  can  be  used  as  the  new  response  data  for  the  design  matrix. 

2.5.5  Summary 

Factor  analysis  is  utilized  in  this  research  to  reduce  the  dimensionality  of  multiple 
response  problems.  Rather  than  examining  the  independent  variables,  the  responses  are 
examined  in  attempt  to  find  common  factors  to  allow  for  grouping  of  different  responses. 
This  grouping  assists  in  reducing  the  dimensionality  of  the  multiple  responses.  Also, 
factor  scores  are  obtained  for  each  of  the  new  factors  which  are  interpreted  as  new 
responses  to  be  analyzed.  Performing  this  analysis  allows  the  user  to  understand 
underlying  correlations  or  similarities  between  different  responses  that  may  not  have 
otherwise  been  seen  in  the  problem. 
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III.  Methodology 


3.1  Overview 

Methodology  employed  in  this  research  is  covered  in  this  chapter.  First,  RPD  is 
demonstrated  using  the  Lin  &  Tu  (1995)  methodology  for  choosing  optimal  settings  in 
problems  containing  noise  variables.  Next,  a  new  methodology  is  developed  to 
determine  alternative  settings  designed  to  guard  against  possible  system  degradation  in 
the  future.  These  are  known  as  “doubly  robust”  settings. 

Artificial  neural  networks  (ANNs)  are  introduced  as  an  alternative  to  quadratic 
regression  in  RPD  for  more  complex  (non-linear)  situations.  Two  examples  are  provided 
to  compare/contrast  settings  obtained  through  quadratic  regression  and  ANNs. 

Following  this,  a  framework  is  developed  to  determine  doubly  robust  settings  using 
ANNs. 

Finally,  factor  analysis  is  implemented  for  multi-response  problems  to  reduce 
dimensionality  of  the  outputs  to  a  single  dimension  without  the  use  of  subject  matter 
experts. 

3.2  Robust  Parameter  Design 

As  discussed  in  Chapter  2,  RPD  is  performed  using  a  crossed  array  design 
combined  with  signal  to  noise  ratio  values  (Taguchi,  1990)  or  by  implementing  a 
combined  array  design  and  using  response  surface  methodology  (Myers  &  Montgomery, 
2002).  Both  crossed  array  and  combined  array  designs  were  utilized  in  this  research  but 
focused  on  response  surface  methodology  techniques  due  to  their  ability  to  obtain  control 
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by  control  variable  interactions.  These  interactions  allowed  for  better  understanding  of 
the  true  nature  of  the  design  space’s  effects  on  the  solution  space. 

To  conduct  RPD,  the  single  model  approach  was  used  to  create  an  overall 
response  model,  also  known  as  a  process  model.  For  a  problem  with p  control  variables 
(x)  and  q  noise  variables  (z),  this  equation  is: 

y(x,z)  =  J30  +  x'  (3  +  x'  Bx  +  z'  y  +  x'  Az  +  e  (3.1) 

y 3  are  the  coefficients  of  the  control  variables,  y  are  the  coefficients  of  the  noise 
variables,  B  is  the  matrix  of  coefficients  for  control  by  control  interactions  and  A  is  the 
matrix  of  coefficients  for  control  by  noise  interactions.  Finally,  a  represents  the  error 
and  is  distributed  normally  with  a  mean  of  zero  and  variance,  a2 ,  which  is  estimated  by 
the  mean  squared  error.  This  process  model  is  obtained  through  simple  regression.  For 
the  coefficients  (assuming  X’X  is  invertible),  the  formula  (X'X)1  (XT)  is 
implemented  where  X  represents  the  design  matrix  (from  the  crossed  or  combined  array) 
of  both  the  control  and  noise  variables  and  Y  represents  the  responses  obtained  during 
experimentation.  From  this  process  model,  the  mean  and  variance  models  can  be 
computed  directly.  These  two  models  are  given  by: 

Eu,e)  [y(x,  z)\  =  p0  +  x'P  +  x'Bx  (3.2) 

v(z,s)  [XT  z)]  =  (/  +  A '  x)  \y  +  A  ’  x)  +  a2  (3.3) 

After  computing  these  two  models,  an  optimization  problem  is  solved  to  minimize 
the  variance  subject  to  achieving  a  target  mean.  Solving  this  dual  response  problem 
requires  choosing  an  approach  outlined  in  Table  1.  The  Lin  &  Tu  (1995)  method  was 
selected  for  this  research,  which  henceforth  will  be  known  as  the  LT  solution.  As  a 
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reminder,  the  LT  solution  is  obtained  in  using  one  of  three  different  objective  functions  in 
a  minimization  framework,  depending  on  the  problem  type. 

A  minimization  problem  (min)  requires  the  response  to  be  nonnegative  since  the 
mean  segment  of  the  MSE  equation  is  squared.  An  assumption  is  made  that  a  response 
which  equals  zero  is  desired.  For  maximization  problems  (max),  the  response  must  also 
be  nonnegative  with  no  restriction  on  its  upper  bound.  If  the  response  can  be  negative  or 
a  specific  value  other  than  zero  (for  min)  or  infinity  (for  max)  is  desired,  target  value 
problems  need  to  be  implemented. 

MSEmm  =  {E{:e)[y(x,z)]}2  +  V(zs)[y(x,z)] 

MSE^  =  ^EizjKx,m2+v{zjyM\  (3.4) 

MSE* get  =  {E(zJy(x,  z)]  -  T}2  +  V{z>e)[j )(x,  z)] 

This  concludes  the  discussion  of  a  standard  methodology  for  finding  the  optimal 
robust  settings.  The  robust  settings  obtained  here  represent  settings  that  are  hopefully 
insensitive  to  the  noise  variables  in  the  system  and  obtain  the  target  mean.  The  next 
section  presents  a  small  example  problem  to  demonstrate  the  RPD  methodology. 

3.2.1  RPD  Example  Problem 

To  demonstrate  the  RPD  methodology,  a  small  textbook  problem  was  adapted 
from  Myers  &  Montgomery  (2002:  566).  The  problem  involved  a  Semiconductor 
process  which  contained  two  controllable  variables  and  three  noise  variables.  It  was 
desired  that  the  system  response  be  minimized.  All  settings  are  presented  as  coded  terms 
since  their  natural  values  are  of  no  interest  in  this  demonstration.  The  text  used  a 
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fractional  CCD  consisting  of  23  runs,  including  center  points  to  allow  for  estimation  of 
curvature.  Table  3  displays  the  problem  data. 


Table  3.  Semiconductor  Example  Problem 


Run  Number 

xl 

x2 

zl 

z2 

z3 

y 

1 

-1 

-1 

-1 

-1 

1 

44.2 

2 

1 

-1 

-1 

-1 

-1 

30 

3 

-1 

1 

-1 

-1 

-1 

30 

4 

1 

1 

-1 

-1 

1 

35.4 

5 

-1 

-1 

1 

-1 

-1 

49.8 

6 

1 

-1 

1 

-1 

1 

36.3 

7 

-1 

1 

1 

-1 

1 

41.3 

8 

1 

1 

1 

-1 

-1 

31.4 

9 

-1 

-1 

-1 

1 

-1 

43.5 

10 

1 

-1 

-1 

1 

1 

36.1 

11 

-1 

1 

-1 

1 

1 

22.7 

12 

1 

1 

-1 

1 

-1 

16 

13 

-1 

-1 

1 

1 

1 

43.2 

14 

1 

-1 

1 

1 

-1 

30.3 

15 

-1 

1 

1 

1 

-1 

30.1 

16 

1 

1 

1 

1 

1 

39.2 

17 

-2 

0 

0 

0 

0 

46.1 

18 

2 

0 

0 

0 

0 

36.1 

19 

0 

-2 

0 

0 

0 

47.4 

20 

0 

2 

0 

0 

0 

31.5 

21 

0 

0 

0 

0 

0 

30.8 

22 

0 

0 

0 

0 

0 

30.7 

23 

0 

0 

0 

0 

0 

31 

Quadratic  regression  was  performed  on  the  data  in  Table  3  to  obtain  the  process 


model,  Equation  (3.5),  with  each  coefficient  estimate  rounded  to  the  second  decimal. 


The  error  term  for  regression  was  calculated  as  s  ~  Normal (0,  yj. 9526) . 


y(x,z )  =  30.37  -2. 92xj  - 4. 1 3x2  +2.6x {  +2.18*2  +2.87*j*2  +  2.73zt  -2.33 z2 

+2.33z3  -0.27.XjZj  +  0.89*jZ2  +  2.58xjZ3  +  2.01x2Zj  -1.43x2z2  +  1.56x2z3  +s 

Computing  Equations  (3.2)  and  (3.3)  based  on  Equation  (3.5)  yielded  the 
following  mean  and  variance  models: 

E(z£)[_v(*,z)]  =  30.37-2.92*j  -4.13x2  +  2.6xj2  +  2.18x2  +  2.87XjX2  (3.6) 

V(,  e)[y(x,z)]  =  19.26 +  6.4*j  +  24.9x2  +  7.52xj2  +  8.52x2  +  4.42xjX2  (3.7) 

The  overall  process  model  required  the  response  to  be  minimized  thus  the  LT 
equation  used  is: 
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MSEmm  =  {E(:e)[v(x,  z)\  }2  +  V{2tS)[y(x,  z)\  (3.8) 

Due  to  the  small  number  of  control  variables,  all  possible  setting  combinations 
between  -1  and  1  with  a  step  size  of  .01  were  tested.  Recall,  these  values  were  in  coded 
terms.  All  possible  combinations  were  applied  to  Equation  (3.8)  to  produce  an  LT 
solution  for  each  combination.  The  minimum  LT  value  was  chosen  and  the  associated 
control  variable  settings  represented  the  optimal  robust  pair  of  settings  for  this  problem, 
as  shown  in  Table  4. 

Table  4.  LT  Settings  and  Solution  to  Semiconductor  Problem 


CM 

X 

X 

Est.  Mean  Est.  Std  Dev 

Est.  LT 

0.13  0.71 

28.463  6.53 

852.74 

The  results  obtained  using  the  LT  formulation  differ  from  results  obtained  by 
Myers  &  Montgomery  (2002).  As  shown  in  Table  1,  these  authors  suggest  minimizing 
the  variance  and  then  choosing  the  associated  mean  value  or  using  contour  plots  to  locate 
an  appropriate  optimal  solution.  Figure  6  displays  the  contour  plot  for  the  mean  model 
and  Figure  7  depicts  the  contour  plot  of  the  variance  model  (in  terms  of  standard 
deviation).  Figure  8  overlays  these  two  plots  showing  the  region  of  settings  (highlighted 
oval)  for  the  control  variables,  as  suggested  by  Myers  &  Montgomery.  These  authors 
solved  the  optimization  problem  of  minimizing  the  variance  subject  to  the  mean  response 
being  less  than  or  equal  to  30.  Their  optimal  point  was  in  the  region  centered  around 
[0.25 , 0].  The  LT  solution  is  compared  to  the  Myers  &  Montgomery  solution  in  Table  5. 

Table  5.  M&M  Solution  and  LT  Solution  to  Semiconductor  Problem 


Approach 

□ 

CM 

X 

X 

Est.  Mean  Est.  Std  Dev 

1 

Est.  LT 

M&M 

LT 

1 

0.25  0 

0.13  0.71 

29.797  4.62 

28.463  6.53 

909.22 

852.74 
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Mean  Contours  for  Semiconductor  Problem 


Figure  6.  Mean  Contour  Plot 


Variance  Contours  for  Semiconductor  Problem 


xl 

Figure  7.  Variance  Contour  Plot 
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Overlay  Plot  of  Mean  and  Variance  for  Semiconductor  Problem 


Figure  8.  Overlay  Contour  Plot 

The  LT  solution  achieved  a  smaller  mean  (28.46)  at  the  cost  of  a  slightly  higher 
variance  (standard  deviation  of  6.5).  Both  solutions  represented  “good”  solutions  by 
maintaining  low  variance  while  satisfying  the  constraint  on  the  mean  value.  Multiple 
approaches  that  may  lead  to  different  optimal  solutions  based  on  tradeoffs  inherent  in 
their  methodologies  were  presented  in  Table  1. 

This  concludes  a  review  on  calculating  robust  optimal  control  settings  using  the 
LT  formulation.  In  the  next  section,  an  approach  is  developed  to  find  control  variable 
settings  that  guard  against  future  system  degradation.  A  new  robust  solution  will  be 
developed  to  counteract  the  possibility  of  the  system  degrading  quickly. 
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3.3  Robust  Parameter  Design  and  System  Degradation 

RPD  assumes  that  the  process  model  properly  fits  the  system  and  that  the 
relationships  between  the  system  response,  control  variables,  and  noise  variables  do  not 
change  with  time.  Ideally,  after  perfonning  RPD,  the  system  would  work  as  predicted. 
However,  as  stated  by  the  second  law  of  thermodynamics  (Carnot  et  al.,  2005),  the 
quality  of  matter/energy  will  deteriorate  over  time.  Hence,  it  would  be  desirable  to 
develop  modifications  to  RPD  to  account  for  the  fact  that  physical  systems  will  tend  to 
degrade  over  time,  thus  reducing  the  perfonnance  of  the  systems.  The  performance  of 
software  systems  can  also  be  degraded  by  being  exposed  to  inputs  beyond  the  experience 
of  their  design  and  training.  If  left  unconsidered,  this  inevitable  degradation  can  be  an 
expensive  cost.  In  this  research,  system  degradation  is  modeled  to  obtain  new  settings 
that  continue  to  remain  robust  to  noise  variables  (traditional  RPD)  while  becoming  robust 
to  changes  in  the  system  that  diminish  performance. 

3.3.1  Guarding  Against  System  Degradation 

In  RPD,  the  mean  performance  and  the  variance  of  the  perfonnance  are 
approximated  by  low  order  polynomials.  One  method  of  finding  a  robust  solution  is  to 
combine  these  expressions  into  a  composite  expression  for  MSE  (LT).  This  expression 
becomes  an  objective  function  in  a  minimization  problem.  This  philosophy  was 
presented  in  Sections  2.2.4  and  3.2.1. 

If  the  system  is  suffering  from  perfonnance  degradation  over  time,  it  seems 
reasonable  to  assume  that  the  relationship  between  the  control  variables  and  the  system’s 
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mean  performance  and  the  variance  of  that  perfonnance  will  change.  This,  in  turn, 
suggests  that  estimated  coefficients  in  the  MSE  (LT)  composite  would  also  change. 

Loeffelholz  &  Bauer  (2009)  examined  this  phenomenon  by  perfonning  an 
experimental  design  (DOE)  on  the  coefficients  of  the  mean  and  variance  models  which 
contained  two  control  variables  and  two  noise  variables.  These  authors  re-sampled  data 
in  the  Semiconductor  Example  problem  based  on  Equation  (3.5).  A  mean  and  variance 
model  was  calculated,  similar  to  those  constructed  in  Equation  (3.6)  and  (3.7).  A  full  two 
level  factorial  was  performed  on  the  mean  and  variance  models  and  the  results  were 
“crossed”  with  one  another  resulting  in  64x128  =  8192  total  runs.  The  high  and  low 
coefficient  settings  (coded  as  -1  and  1)  for  each  run  were  calculated  as  a  constant 

percentage  change  (<7)  in  each  coefficient  for  the  mean  {p  =  /?±c>(/?)) and  variance 

( y  =  y±S(y ))  models.  Each  new  set  of  coefficients,  (3  and  y ,  populated  an  ensemble  of 

mean  models  and  variance  models,  respectively.  Some  of  these  models  exhibited 
performance  improvements,  while  others  exhibited  degraded  perfonnance. 

An  LT  solution  was  calculated  for  all  8192  control  variable  coefficient 
combinations.  This  resulted  in  LT  values  that  were  either  higher  or  lower  than  the 
original  optimal  solution.  Only  the  combination  of  coefficients  that  yielded  LT  values 
greater  (worse)  than  the  original  optimal  solution  were  considered  as  possible  candidate 
coefficient  sets  that  reflected  system  degradation.  The  authors  chose  the  coefficient 
settings  which  resulted  in  the  worst  LT  value  and  obtained  new  optimal  settings  that  were 
designed  to  be  more  robust  to  the  noise  variables  in  the  system  as  well  as  certain 
perturbations  in  the  system  causing  degradation. 
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To  test  these  “doubly  robust”  settings,  the  system  was  tested  under  both  normal 
operating  conditions  and  degraded  conditions.  Under  nonnal  conditions,  the  original 
RPD  settings  outperformed  the  doubly  robust  settings,  but  the  difference  did  not  appear 
significant.  When  the  system  was  tested  under  various  conditions  that  caused  system 
degradation,  the  doubly  robust  settings  outperfonned  the  original  RPD  settings 
significantly.  This  method  appeared  effective,  but  the  change  in  coefficients  were 
required  to  be  constant  for  every  term  in  the  model  which  led  to  a  situation  where  the 
actual  system  degradation,  as  measured  in  increase  MSE  (LT),  was  unpredictable 
(Loeffelholz  &  Bauer,  2009). 

There  are  numerous  locations  in  the  coefficient  space  of  the  mean  and  variance 
models  that  result  in  a  decrease  in  performance  (increased  LT  value).  A  method  is 
desired  to  guard  against  conditions  where  the  LT  criterion  has  increased  to  some  preset 
percentage  of  its  optimum  value.  One  way  to  find  such  conditions  is  to  follow  the 
direction  of  maximum  change  in  the  LT  function  (in  terms  of  the  coefficients,  given  the 
optimal  control  settings)  until  this  percentage  increase  is  realized.  This  is  accomplished 
in  the  coefficient  space  by  using  a  simple  gradient  search,  as  seen  in  Figure  9. 
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Figure  9.  Gradient  Search  for  New  Solution 
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Figure  9  depicts  the  optimal  RPD  settings  as  x* ,  which  correspond  to  the  optimal 
LT  value  ( F  )  at  the  appropriate  coefficients,  CeC,  where  C  is  the  coefficient  space. 

The  optimal  LT  value  can  also  be  represented  as  Y*C)  =  Y  (v*C)  j .  A  gradient  search,  based 


upon 


f  QY} 
vfCy 


is  performed  in  C  until  a  specified  percentage  increase  in  LT  [y** )  is 


realized.  This  percentage  increase  in  LT  is  calculated  as  Y **  = 


f  x 

1  +  -X 


V  100y 


Y  .  Prior  to  the 


gradient  search,  C,mt,al  =  C ,  and  at  the  culmination  of  the  gradient  search,  cAegrade  =  C  . 

The  LT  problem  is  then  resolved  at  this  point  (CdC8ra<fe)  to  obtain  the  doubly  robust 

.  ** 

settings,  x  . 

For  notational  purposes,  several  terms  are  defined.  Given  the  mean  and  variance 
models  in  Equations  (3.2)  and  (3.3),  the  general  form  of  the  LT  (7)  function  for  a 
minimization  problem  is: 


Y  =  {£<*.,)  UK*.*)]}2  +V(zAKx,z)\ 


(Z,£)l 


(3.9) 


Equation  3.9  can  be  can  be  rewritten  as  (assuming  a]  =  1 ): 


Y  =  j/?0  +x'/3  +  x'Bx}  +(/  +  A’x)'(y  +  A’x)  +  a2  (3.10) 

For  differentiating  purposes,  Equation  (3.10)  can  be  rewritten  as: 

T  =  ®  +  'F  (3.11) 

For  p  control  variables  and  q  noise  variables,  the  elements  of  Equation  (3.11)  are 
defined  in  Equation  (3.12)  and  (3.13).  fijk  represent  the  coefficients  of  the  B  matrix 

previously  defined  as  the  control  by  control  interaction  coefficients.  8 n  represent  the 


65 


coefficients  of  the  A  matrix  previously  defined  as  the  control  by  noise  interaction 
coefficients. 


0=  A>  1  Lfyj  •  ZX’VA 


(3.12) 


^  =  Z  ^+Z<V 


1=1  v  j= 1 


(3.13) 


A  gradient  search,  at  small  steps,  is  conducted  over  the  C  space  to  locate  the  set 
of  coefficients  that  lead  to  a  preset  percentage  degradation  in  LT  performance.  Since  the 
gradient  is  used,  the  degradation  is  achieved  in  the  quickest  possible  fashion.  The 
gradient  of  each  coefficient  in  the  LT  function  is  calculated.  In  general,  differentiating 
<f>  with  respect  to  its  coefficients  yields: 


W = Jb  r 0  +  £  fijXj + ^ £ *  A**  =  2  r 0 + ^  PjXj + ^  x  Ax*  (3  • 1 4) 

°P  0  VPo  f  7=1  7=1  1=7  )  f  7=1  7=1  1=7  ) 


Ur  "  ilr  r 0 + ^ Pjxj + Z Z x  Axi  = 2 Ui + Z + Z Z A-  (3-l5) 

°Pj  °Pj  \  7=1  7=1  k=j  J  \  7=1  7=1  1=7  y 


If  -  =  [  l00  +  Z  M  +  Z  Z  X  AX*  I  =  2  [  A>  +  Z  ^7x7  +  Z  Z  X  AXt  I  X7XA-  (3  • 1 6) 

°Pjk  VPjk  \  7=1  7=1  *=7  7  f  7=1  7=1  1=7  y 


In  general,  differentiating  T  with  respect  to  its  coefficients  yields: 


^  =  ^Z  L  +  ZVy  =  2  A  +  Z^ 
Sr,  Sr,  m  J  l  m 


(3.17) 


3  ^  (  'V'c*  1  of  xz 

at  =  a^Z  r,  +  Lsnxj  =  2  rf  +  Z^ 

Sdj,  dSj,  i=1  ^  ;=1  J  ^  7=1 


>X7  X7 


(3.18) 
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Therefore,  differentiating  Equation  (3.10)  relative  to  the  original  coefficients  (C) 


of  Y  yields  the  vector: 


V7  =  — = 

dc 


dY  dY  BY  BY  BY  BY 


BPo’  'W'Br,'  ’dr/d^’  ’ 88 ( 


(3.19) 


qp  J 


p  +  3  p  4-  2 

There  are  — - — - coefficients  due  to  the  mean  model  and  q  +  pq  coefficients 


due  to  the  variance  model.  Therefore,  the  gradient  vector,  V  Y  contains  r  partial 
derivatives: 


r  = 


^  p1  +  3/?  +  2^ 


+  (q  +  pq) 


(3.20) 


For  a  small  step  size  E,  and  C  =  C  ,  the  gradient  search  has  the  form: 


Cnew  =  Cold  +%(VY) 


(3.21) 


After  each  small  step  of  the  gradient  search,  the  LT  problem  containing  the  new 
perturbed  coefficients  needs  to  be  resolved: 

xnew  =  arg  min  Y(C"m’,x)  (3.22) 

where  D  is  the  design  space  for  x.  Figure  9  depicts  the  gradient  search  in  C  space  as 
well  as  the  mirrored  sequence  of  optimal  settings  in  the  control  variable  space.  This 

process  is  repeated  until  the  preset  percentage  degradation  of  LT  performance  (7**  j  is 

realized.  At  this  point,  the  LT  problem  is  resolved  a  final  time  to  determine  the  final 
optimal  control  settings.  Figure  9  depicts  these  doubly  robust  settings  as  x**,  whereas  the 
original  LT  optimal  settings  are  x  .  Figure  10  summarizes  the  algorithm  used  to  find  a 
doubly  robust  solution. 
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Y  =  {B(Z,e)  [Kx,2 ■)]}  +  VM  [y(x,z)\ 


Cnew  j 


Jo 


1 

argmin  Y(C,x) 

^  _ £tiritial  ^ 


xeD 


vr 


dr 


dC 


i 


QOld 


Cnew  =CM  +g(VY) 

l 

argmin  r(C"",x) 

xeD  X  / 


i 


Y  realized? 


Yes 


Original  LT 

{c***  =  c ) 


| ^  jr-ydegnaif  _  ^tnew  ^ 

Doubly  Robust  LT 


-  V  \ 


Figure  10.  Algorithm  for  Finding  a  Doubly  Robust  Operating  Point 


3.3.2  Doubly  Robust  Solution  Example 

To  demonstrate  the  doubly  robust  algorithm  described  in  Section  3.3.1,  the 
previous  Semiconductor  Manufacturing  example  was  implemented.  To  test  the  doubly 
robust  solution,  a  Truth  Model  was  necessary  to  perform  confirmatory  trials.  Therefore, 
the  process  model  from  Equation  (3.5)  was  used  as  the  Truth  Model.  For  convenience, 
Equation  (3.5)  was  reproduced  as  Equation  (3.23)  below: 

y(x,z )  =  30.37  -2. 92xj  - 4. 1 3x2  +2.6xf  +2.18x2  +  2.87XjX2  +2.73z1  -2.33z2  ^ 

+2.33z3  -0.27XjZj  +  0.89xjZ2  +  2.58xjZ3  +  2.01x2Zj  -1.43x2z2  +1.56x2z3  +s 

The  variance  of  the  error  term  remained  0.9526,  in  order  to  obtain  a  different 
process  model  during  each  re-sampling  of  y(x,  z) .  The  same  23  run  CCD  was 
implemented  to  create  new  response  values.  Quadratic  regression  was  performed  to 
obtain  the  following  process  model,  where  s  ~  N(0,  Vl  .01)  : 


y(x,z)  =  29.95 -3.16.Xj  -4.099x2  +  2.895x1>  +  2.33x2  +2.72xjX2  +  2.67z3  -2.497z2 
+2.38z3  -0.37XjZj  +1.06xjZ2  +  2.79x3z3  +2.04x2Zj  -1.38x2z2  +  1.85x2z3  +s 


(3.24) 


From y(x,z) ,  the  mean  model  and  variance  models  were  computed  as: 


E(Z,e)[y(x,z)]  =  29.95- 3. 16xj  -4.099x2 +2.895x32  +2.33x22  +2.72x3x2  (3.25) 

V(z  [ y(x, z)]  =  (2.67  -  0.37xj  +  2.04x2 )2  +  (-2.497  + 1 .06xj  - 1 ,38x2  f 

(3.26) 

+  (2.38  +  2.79xj  +1.85x2)2  +  1.01 


This  problem  remained  a  minimization  problem,  thus,  using  Equation  (3.9)  to  calculate 
LT  values  for  every  combination  of  control  settings.  This  equation  was: 
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(3.27) 


LT  =  (29.95-3.16*!  -4.099x2  +2.895^  +  2.33x2  +  2.72XJX,)2  + 

(2.67  - 0.37xj  +  2.04x2  )2  +  (-2.497  + 1 .06^  - 1 ,38x2 )2 
+  (2.38  +  2.79x1  +  1.85x2)2  +  1.01 

The  optimal  (minimum)  LT  value  selected  yielded  control  settings  [0.22,  0.60], 
These  settings  are  similar  to  the  settings  presented  in  Table  4,  thus  verifying  the  approach 
taken  with  re-sampling  data  and  performing  quadratic  regression  on  the  model. 

A  gradient  search  was  then  applied  to  Equation  (3.27),  where  Y=LT,  to  determine 
doubly  robust  settings.  Equation  (3.27)  was  derived  from  Equation  (3.10)  which  is  given 
again  as  Equation  (3.28): 

Y  =  j/?0  +x'j3  +  x'Bx]  +(/  +  A’x)'(y  +  A’x)  +  cr2  (3.28) 

For  this  particular  example  problem,  two  control  variables  and  three  noise  variables  were 
used.  Recall  the  gradient  search  calculates  the  partial  derivatives  for  every  term  in  Y  with 
respect  to  C.  A  partial  derivative  is  taken  for  each  coefficient  in  C,  thus  yielding  a  single 
vector  of  partial  derivatives.  Note  that  each  partial  derivative  is  a  scalar.  Therefore,  for  r 
terms  in  C,  the  general  form  of  the  partial  derivatives  in  Equation  (3.28)  is: 
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dY 

~dC 


dY 

8P„ 

2 1  J3q  +  x'  (3  +  x'  Bx} 

dY 

W 

2 1  /?0  +x'/3  +  x'Bx}xx 

dY 

dpp 

2  j/?0  +  x'  J3  +  x'  Bx}xp 

dY 

epu 

2{  f30+x'  f3  +  x'Bx}xlx1 

dY 

dB 

H pp 

— 

2  j/?0  +  x'/?  +  x'J5x}x/)x/) 

dY 

dyx 

2{(/i  +  A'x)} 

dY 

or, 

2{(r?+A'x)} 

dY 

S8n 

2{(Xi+Afx)}x1 

dY 

W 

2{(/?+A'x)}xp 

(3.29) 
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This  particular  example  contained  r  = 


+  (3  +  (2)(3))  =  15  partial 


22  +  (3)(2)  +  2 
2 

derivatives.  After  calculating  all  the  partial  derivatives,  the  coefficients  in  Equation 
(3.27)  were  used  as  the  starting  point  for  the  gradient  search.  Small  step  sizes  (t,  =  .001) 

were  implemented.  Following  the  completion  of  a  single  step,  the  new  LT  (Y)  problem 
was  solved  to  obtain  optimal  control  settings.  As  outlined  in  Figure  10,  this  process  was 
repeated  until  Y **  =  1023.28  was  achieved,  which  is  defined  as  a  20  percent  increase  in 
LT.  This  example  ended  the  gradient  search  with  doubly  robust  settings, 
x*  =[0.24,0.33], 

A  summary  of  the  original  LT  settings  (V )  and  doubly  robust  settings  (Vj  is 
provided  in  Table  6. 


Table  6.  Original  and  Doubly  Robust  Settings 


Settings 

xl  x2 

Original 

Doubly  Robust 

0.22  0.6 

0.24  0.33 

The  settings  were  quite  different  from  one  another  (in  tenns  of  coded  values),  due 
to  the  LT  contour  space  in  this  problem  being  relatively  flat  around  the  original  optimal 
settings.  Figure  1 1  depicts  the  LT  contour  plot  for  this  problem  across  control  settings  of 
[-1  ,  1]  for  each  control  variable.  The  large  oval,  centered  around  [0.22 , 0.60]  indicates 
the  settings  within  this  region  contain  LT  values  very  close  to  one  another.  This  provides 
some  rationale  as  to  why  the  doubly  robust  solution  exhibits  good  performance  under 
nominal  (non-degraded)  operating  conditions. 


72 


LT  Contour  Plot  for  Semiconductor  Example  Problem 


xl 


Figure  11.  LT  Contour  Plot  for  Semiconductor  Example  Problem 

To  validate  the  doubly  robust  settings,  an  experiment  was  conducted  in  which 
/?„  and  g\  were  varied  in  Equation  (3.27).  Varying  these  terms  increases  the  responses 
of  the  mean  and  variance  models  respectively,  thus  causing  degraded  performance.  At 
normal  operating  conditions,  these  values  were  29.95  and  1.0  respectively.  These  values 
were  increased,  thus  increasing  the  LT  value  which  in  turn  displays  system  degradation. 
A  range  of  [0,10]  was  chosen  for  J30  and  cr  .  For  J30 ,  the  range  represented  a  constant 

added  to  the  normal  operating  condition  values  of  the  mean  response.  For  cr ,  the  range 

represented  a  multiplicative  effect  ( 1  +  increase)  on  the  variance  only  for  values  greater 
than  one,  thus  increasing  the  variance. 

Figure  12  displays  the  LT  contours  over  the  variation  of  the  intercept  tenn  and 
sigma  value  for  the  original  LT  settings,  [0.22,0.60] .  Figure  13  displays  the  LT  contours 
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for  the  doubly  robust  settings,  [0.24,0.33].  As  expected,  as  /?0and  <j]  increased,  the  LT 


solution  at  the  given  settings  increased  as  well. 
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Figure  12.  LT  Contours  for  Original  LT  Settings 
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Figure  13.  LT  Contours  for  Doubly  Robust  Settings 
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To  determine  the  validity  of  the  doubly  robust  settings,  the  LT  values  shown  in 
Figures  12  and  13  were  compared.  Specifically,  the  calculation  LT(x**^-LT(x*^  was 

performed.  A  positive  number  demonstrated  that  the  Original  settings  (V  j  outperform 

the  doubly  robust  settings  (x**)  and  a  negative  number  proposed  opposite  results.  The 

difference  of  LT  contours  between  the  two  sets  of  settings  is  given  in  Figure  14. 

The  dashed  line  in  the  figure  represents  the  boundary  which  separates  regions 
where  the  two  solutions  are  preferred.  Above  the  dashed  line  (positive  values)  represents 
the  Original  settings  obtain  a  lower  value  and  below  the  dashed  line  (negative  values), 
doubly  robust  settings  achieve  lower  LT  values.  Since  the  contours  change  very  little 
when  the  intercept  term  was  adjusted,  the  intercept  appeared  to  have  little  effect  on 
straying  from  the  Original  solution.  However,  once  the  a]  value  was  increased,  thus 

increasing  the  variance  of  the  system,  the  doubly  robust  settings  were  preferred. 

An  important  note  to  make  is  the  scale  of  the  contours,  located  on  the  z-axis  of  the 
figure.  Although  below  the  dashed  line  the  Original  settings  were  preferred,  the  doubly 
robust  settings  were  not  far  behind.  The  largest  difference  in  which  the  Original  settings 
were  preferred  is  by  13,  which  occurred  by  inflating  the  intercept  term  well  beyond  a 
necessary  boundary.  However,  when  increasing  the  sigma  value  in  small  steps,  the 
doubly  robust  settings  drastically  moved  further  away,  in  tenns  of  LT  value,  from  the 
Original  settings. 
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Intercept  (Bo) 

Sigma  (z) 

Figure  14.  LT(x**)  -  LT(x*)  Contour  Plot 

These  results  verify  the  algorithm  employed  to  determine  doubly  robust  settings. 
Not  only  are  the  settings  robust  to  noise  variables,  they  are  also  robust  to  perturbations  in 
the  system  causing  degradation  in  performance.  The  doubly  robust  settings  prove 
invaluable  if  a  user  is  uncertain  as  to  whether  the  system  being  employed  will  remain 
perfectly  functional  or  if  over  time  things  may  unknowingly  occur  reducing  performance. 
This  research  assumes  those  changes  in  the  system  are  unknown,  because  if  known,  RPD 
can  be  re-evaluated  to  obtain  new  settings  reflecting  the  changed  system. 


3.3.4  RPD  Summary 

Section  3.3  provided  methodology  on  utilizing  gradient  analysis  in  a  quadratic 
regression  framework  to  guard  against  system  degradation.  In  terms  of  minimal  or  severe 


76 


degradation,  the  doubly  robust  settings  proved  to  be  more  robust  than  the  original  LT 
settings.  Also,  under  nonnal  operating  conditions,  the  doubly  robust  settings  proved 
competitive. 

Much  of  the  literature  cited  in  Chapter  2  addresses  different  issues  in  RPD  such  as 
solving  the  dual  response  problem.  However,  little  in  the  literature  suggests  alternative 
methods  for  deriving  mean  and  variance  models  other  than  the  use  of  quadratic 
regression.  Myers  &  Montgomery  (2002:  562)  state  “we  do  not  mean  to  rule  out  the  use 
of  interaction  in  noise  or  higher  than  quadratic  terms. .  .however,  the  model  [in  Section 
2.2.3]  will  accommodate  many  real-life  situations.”  An  alternative  is  explored  in  Section 
3.4  as  artificial  neural  networks  are  implemented  when  quadratic  regression  poorly  fits 
the  given  data  or  significant  lack  of  fit  is  realized. 

3.4  Artificial  Neural  Networks 

Some  problems  may  be  highly  non-linear  in  nature  and  cannot  be  accurately 
modeled  by  quadratic  regression.  In  addition,  standard  RPD  methodologies  do  not  model 
interactions  between  noise  variables  which  may  be  important.  Properly  applied,  artificial 
neural  networks  (ANNs)  allow  for  the  modeling  of  higher  order  terms  and/or  noise 
variable  interactions,  thus  providing  the  flexibility  to  fit  non-linear  data.  Therefore,  the 
use  of  ANNs  to  model  the  process  model  and/or  the  mean  and  variance  models  appears 
appropriate. 

Radial  Basis  Function  Neural  Networks  (RBFNNs)  and  Generalized  Regression 
Neural  Networks  (GRNNs)  were  selected  for  this  research.  These  ANNs  are 
computationally  efficient  relative  to  training.  However,  if  quadratic  regression  can  model 
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the  process  model  accurately,  it  should  be  utilized  rather  than  ANNs  due  to  its  parsimony 
and  broader  level  of  familiarity  to  the  typical  practitioner. 


3.4.1  ANNs  and RPD 

Two  approaches  were  developed  to  apply  ANNs  in  RPD.  In  one  approach  the 
ANN  performs  “post-processing”  and  the  other,  it  performs  “pre-processing”  of  the  input 
data.  The  former  uses  control  and  noise  variables  as  inputs  and  the  latter  only  inputs 
control  variables.  “Pre-processing”  ANNs  perform  quicker  but  require  the  use  of  a 
crossed  array  design  to  collect  appropriate  variance  values.  The  “post-processing”  ANNs 
work  well  with  either  crossed  or  combined  array  designs.  To  aid  in  demonstration,  a 
notional  crossed  array  design  matrix  containing  two  control,  x,  and  two  noise  variables, 

Z  =  {zj , z2 }  ,  ( 3 2  x  22 )  with  response,  y,  was  created. 


Table  7,  Notional  Design  Matrix 


zl 

-1 

-1 

1 

1 

z2 

-1 

1 

-1 

1 

xl 

x2 

-1 

-1 

47 

13 

10 

30 

-1 

0 

44 

93 

81 

91 

-1 

1 

53 

66 

37 

58 

0 

-1 

93 

55 

39 

96 

0 

0 

68 

34 

71 

28 

0 

1 

26 

23 

50 

66 

1 

-1 

74 

23 

34 

13 

1 

0 

7 

3 

50 

94 

1 

1 

63 

28 

80 

78 

The  first  ANN  developed  uses  a  neural  network  with  a  single  response  as  the 
output,  where  A  =  A  post : 

y  =  A(x;z )  (3.30) 
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The  entire  dataset  in  Table  7  was  implemented  by  using  the  control  and  noise 
variables  as  the  inputs.  The  output  response  was  averaged  across  all  noise  variable 
combinations  (N)  for  each  unique  combination  of  control  settings.  This  value  represents 
the  expected  value  for  each  combination  of  control  settings,  as  denoted  in  Equation 
(3.31): 

uy(x)  =  N~l'YjA(x-,zi)  (3.31) 

z(eZ 


Variance  values  are  calculated  in  the  same  manner  by  determining  the  variance 
across  all  noise  variable  combinations  for  every  control  setting  as  denoted  in  Equation 
(3.32): 


cr 


2^(x;z,.)2-  ^A(x;z,)  IN 

2  ( v\  _  iiff _ Vz.eZ _ J 

j  (TV-1) 


(3.32) 


In  this  example,  the  result  would  be  nine  control  variable  combinations,  each  with 
an  expected  value  and  variance  value. 
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Application 


Figure  15.  Approach  1  to  Develop  ANN  for  RPD 
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The  previous  approach  requires  a  large  number  of  inputs  into  the  ANN  due  to  the 
inclusion  of  noise  variables.  To  account  for  this,  “pre-processing”  can  be  done  to  reduce 
the  number  of  inputs  but  this  approach  requires  a  crossed  array  design.  The  next  ANN 

(  A pre  )  constructed  takes  a  single  neural  network  and  provides  two  outputs:  mean  and 

variance.  Prior  to  executing  the  ANN,  the  mean  (Equation  (3.3 1))  and  variance 
(Equation  (3.32))  of  each  row  in  Table  7  was  calculated,  which  represents  the  “pre¬ 
processing”  stage.  The  unique  control  settings  were  used  as  the  inputs  of  each  model 
with  their  respective  outputs.  This  notional  data  is  displayed  in  Table  8. 


Table  8.  Pre-Processed  Data  for  Mean  and  Variance 


CM 

X 

T— 

X 

y  (mean)  y  (variance) 

-1  -1 

-1  0 

-1  1 

0  -1 

0  0 

0  1 

1  -1 

1  0 

1  1 

25  300 

77  520 

54  153 

71  800 

50  497 

41  433 

36  701 

39  1833 

62  578 

This  network  is  efficient  due  to  the  small  number  of  input  variables.  However, 
one  drawback  is  the  requirement  of  sufficient  data  to  accurately  model  the  variance, 
suggesting  the  use  of  a  crossed  array  design;  since,  if  only  one  combination  of  noise 
variable  settings  is  taken  for  each  set  of  control  setting  combinations,  insufficient  data 
exists  for  variance  estimation.  This  ANN  has  two  outputs.  Alternatively,  two  networks 
with  one  output  could  also  be  used. 
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3.4.2  Semiconductor  Extended  Example  Using  ANNs  in  RPD 

The  example  applied  in  Section  3.2.1  was  adjusted  to  represent  a  more  difficult 
non-linear  problem.  Terms  were  added  to  the  model  to  include  third  and  fourth  order 
terms  as  well  as  interactions  between  noise  variables.  The  following  model  was  used  as 
the  Truth  Model: 


y(x,z )  =  30.37-2.92xj  -4.13x2  +2.6xj2  +  2.18x2  +  2.73zj  -2.33z2 

+2.33z3  -0.27XjZj  +0.89xjZ2  +  2.58xjZ3  +2.01x2Zj  -1.43x2z2  +1.56x2z3 
+3.8094xj4  +2.163x2  +2.9954xj3  +4.8661xj2x2  +3.4496xj2x2  +  2.0059xjX2 
-0.0085zj2  +3.56z2  +2.8541z2  +1.3269z24  +2.624zjZ2  -4.5689z2z3  +  £ 


s  =  Normal  0, 


To  determine  the  true  optimal  settings  for  the  Truth  Model,  an  exhaustive  search 
was  performed  between  all  possible  combinations  of  control  variables  and  noise  variables 
at  a  step  size  of  .01,  using  coded  levels.  For  each  combination  of  control  variables 
settings,  the  response  was  averaged  (mean)  and  the  variance  obtained  across  all  noise 
variable  settings  was  calculated.  These  values  were  then  applied  to  the  LT  formulation 
for  a  minimization  problem  (Equation  (2.12)).  This  process  was  replicated  100  times  and 
the  optimal  LT  settings  are  given  in  Figure  17.  After  obtaining  100  control  settings  and 
associated  LT  values,  the  results  were  averaged  and  are  reported  in  Table  9.  These 
optimal  settings  provide  a  basis  for  comparison  of  results  obtained  through  ANNs  versus 
quadratic  regression. 
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Scatterplot  for  Semiconductor  Extended  Example 
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Figure  17.  Scatterplot  of  Optimal  LT  Settings  over  100  Replications 


Table  9.  True  Optimal  Settings  for  Equation  (3.27) 


X 

X 

ro 

Actual  LT 

0.24  0.48 

1382.4 

To  re-sample  data  from  the  Truth  Model  (Equation  3.33),  a  32  x  23  crossed  array 
design  was  utilized  to  support  both  ANN  approaches  outlined  in  this  research.  After 
obtaining  response  values  for  each  treatment  row  in  the  crossed  array  design,  quadratic 
regression  was  performed  to  fit  a  process  model.  The  mean  and  variance  models  were 
computed  thus  allowing  for  calculation  of  LT  values  for  all  control  settings  between 
[-1,1] .  Quadratic  regression  yielded  optimal  control  variable  settings  of  [1,  0.31].  The 
actual  LT  value  of  these  control  settings  were  obtained  from  the  exhaustive  search 
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performed  on  the  Truth  Model.  These  settings  corresponded  to  an  LT  value  of  2108 
which  is  52.5  percent  larger  than  the  LT  value  of  the  true  optimal  settings. 

To  understand  how  well  quadratic  regression  performs,  Table  10  displays  the 
ANOVA  table  for  this  example.  The  ANOVA  table  displays  a  very  large  p-value,  thus 
making  the  current  quadratic  model  not  significant.  The  desired  p-value  is  typically 
denoted  as  a  =  0.10  or  a  =  0.05 . 


Table  10,  ANOVA  for  Semiconductor  Extended  Example 


Source 

Sum  of 
Squares 

df 

Mean 

Square 

F 

Value 

p-value 

Prob  >  F 

Model 

Residual 

Cor  Total 

1525.40 

8096.91 

9622.31 

1 

108.96 

142.05 

0.77 

0.6988  not  significant 

The  coefficient  of  detennination,  or  R2 ,  provides  insight  into  the  amount  of 
variability  of  the  data  set  captured  by  the  model  (Montgomery  et  al.,  2004).  Along  with 
adjusted  R2 ,  these  value  determine  how  well  the  model  fits  the  given  data.  This 
particular  model  obtained  an  R2  =  0. 1585  ,  indicating  a  poor  fit. 

To  demonstrate  how  poor  the  quadratic  regression  fits  the  given  data,  since  the 
Truth  Model  is  known,  the  predicted  model  can  be  explored.  Table  1 1  provides  the 
coefficients  for  the  predicted  model  when  sampling  data  from  Equation  (3.33),  as  well  as 
the  true  coefficients.  These  estimated  coefficients  are  drastically  different  than  the  true 
coefficients. 
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Table  11.  Coefficients  for  Semiconductor  Extended  Example 


Factor 

Estimate 

Truth 

Intercept 

46.27 

30.37 

xl 

-0.10 

-2.92 

x2 

-1.10 

-4.13 

zl 

-1.19 

2.73 

z2 

2.08 

-2.33 

z3 

0.94 

2.33 

x1x2 

-0.24 

0.00 

xlzl 

1.82 

-0.27 

x1z2 

-2.60 

0.89 

x1z3 

2.18 

2.58 

x2z1 

0.17 

2.01 

x2z2 

0.01 

-1.43 

x2z3 

2.15 

1.56 

x1A2 

-0.11 

2.60 

x2A2 

1.84 

2.18 

x1A4 

0.00 

3.81 

x2A4 

0.00 

2.16 

x1A3 

0.00 

2.99 

(x1A2)(x1A2) 

0.00 

4.86 

(x1A2)(x1) 

0.00 

3.45 

(x1)(x1A3) 

0.00 

2.01 

z1A2 

0.00 

-0.01 

z2A2 

0.00 

3.56 

z3A2 

0.00 

2.85 

z2A4 

0.00 

1.32 

z1z2 

0.00 

2.62 

z2z3 

0.00 

-4.57 

Along  with  the  ANOVA  table  and  summary  statistics,  several  residual  plots  were 
examined.  Figure  18  presents  the  normal  probability  plot  for  this  example.  The  plot 
shows  a  light  tailed  distribution  on  the  ends.  This  could  indicate  several  outliers 
“pulling”  the  least  squares  estimates  from  their  true  values. 
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Normal  Plot  of  Residuals 


-1.86  -0.77  0.32  1.42  2.51 


Figure  18.  Normal  Probability  Plot  for  Semiconductor  Extended  Example 

The  plot  for  predicted  values  versus  actual  values  is  displayed  in  Figure  19.  This 
plot  helps  determine  the  model’s  predictability,  given  new  observations.  This  plot  allows 
a  visual  interpretation  of  the  predicted  R 2  value.  As  seen  in  Figure  19,  this  model  poorly 
predicts  new  observations. 
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Predicted  vs.  Actual 


Actual 

Figure  19.  Predicted  vs  Actual  for  Semiconductor  Extended  Example 

Analysis  in  Tables  10-11  and  Figures  18-19  indicate  that  the  robust  optimal 
settings  obtained  through  quadratic  regression  may  not  truly  represent  the  optimal 
settings.  In  fact,  according  to  Table  9  which  gives  the  optimal  settings  for  the 
Semiconductor  Extended  example,  the  quadratic  regression  settings  are  nowhere  near  the 
optimal.  Thus,  this  analysis  indicates  that  the  use  of  ANNs  may  be  appropriate  to  better 
model  Equation  (3.33). 

When  developing  an  ANN,  a  spread  parameter  must  be  defined.  Section  2.3.2 
and  2.3.3  defined  the  spread  parameter  as  it  is  pertinent  to  RBFNNs  and  GRNNs. 
MATLAB®  calculates  this  parameter  as  0.8326/spread,  where  spread  is  user  preference 
(typically  a  default  of  0.1  for  RBFNNs  and  1.0  for  GRNNs).  Prior  to  analysis,  a  small  set 
of  the  data  was  withheld  to  optimize  this  value  for  the  optimal  spread,  which  could 
change  based  on  the  holdout  dataset.  Once  the  spread  was  determined  for  both  the 


RBFNN  and  GRNN,  the  two  ANN  approaches  (l=post-processing  ,  2=  pre-processing) 
were  applied  to  the  full  dataset  to  determine  robust  settings,  assuming  a  minimum  LT 
existed.  Table  12  reports  the  true  optimal  and  quadratic  regression  settings  as  well  as  the 
ANN  results.  As  seen  in  Table  12,  the  ANNs  outperform  quadratic  regression  regardless 
of  the  approach  or  type  of  neural  network  used.  In  fact,  the  RBFNNs  obtain  the  true 
optimal  in  each  instance.  These  results  demonstrate  the  potential  of  ANNs  when 
quadratic  regression  fails  to  properly  model  the  problem. 


Table  12.  ANN  Results  on  Semiconductor  Extended  Example 


Method 

xl 

x2 

Actual  LT 

Actual 

0.24 

0.48 

1382 

QR 

1 

0.31 

2108 

ANN  1  RBFNN 

GRNN 

0.24 

0.48 

1382 

0.18 

0.52 

1383 

ANN  2  RB™N 

GRNN 

0.24 

0.48 

1382 

0.24 

0.48 

1382 

3.4.3  Koksoy  Problem  Using  ANNs  in  RPD 

The  Semiconductor  Extended  example  in  Section  3.4.2  used  a  crossed  array 
design  thus  providing  more  data  points.  This  crossed  array  design  used  was  tailored  for 
ANN  approach  2.  An  example  was  adapted  from  Koksoy  (2008)  that  uses  a  combined 
array  design  for  three  control  variables  and  two  noise  variables.  A  CCD  was  used 
consisting  of  25  runs  and  response  values  were  provided  for  two  outputs.  Yx  needed  to 

achieve  a  target  value  of  1.0  while  Y2  was  to  be  minimized  (>  0) .  Table  13  displays  the 
design  and  responses  for  this  example  problem. 
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Table  13.  Example  Problem  2  (Koksoy,  2008) 


The  experimental  results  for  the  force  transducer  experiment 


Run 

*1 

x2 

*3 

"l 

z2 

y\ 

v2 

i 

-1 

-1 

—1 

-1 

1 

1.81 

1.10 

2 

-1 

-1 

-1 

1 

-1 

1.69 

l.u 

3 

-I 

-1 

1 

-1 

-1 

1.90 

1.07 

4 

-1 

-1 

1 

1 

1 

1.78 

1.07 

5 

-1 

1 

-I 

-1 

-1 

1.80 

1.47 

6 

-1 

1 

-I 

1 

1 

1.63 

1.18 

7 

-1 

1 

1 

-1 

1 

1.92 

1.41 

8 

-1 

1 

1 

1 

-1 

1.78 

1.58 

9 

1 

-1 

-1 

-1 

-1 

1.36 

1.57 

10 

1 

-I 

-1 

1 

1 

1.22 

2.03 

11 

1 

-1 

1 

-1 

1 

1.48 

1.38 

12 

1 

-1 

1 

1 

-I 

1.44 

1.68 

13 

1 

1 

-1 

-1 

1 

0.693 

3.37 

14 

1 

1 

-1 

1 

-1 

0.616 

3.75 

15 

1 

1 

1 

-1 

-I 

0.950 

2.81 

16 

1 

1 

1 

1 

1 

0.817 

2.83 

17 

-1 

0 

0 

0 

0 

1.79 

1.24 

18 

1 

0 

0 

0 

0 

1.03 

2.46 

19 

0 

-1 

0 

0 

0 

1.53 

1.23 

20 

0 

1 

0 

0 

0 

1  22 

1.73 

21 

0 

0 

-1 

0 

0 

1.30 

1.63 

22 

0 

0 

1 

0 

0 

1.44 

1.67 

23 

0 

0 

0 

0 

0 

1.38 

1.73 

24 

0 

0 

0 

0 

0 

1.39 

1.74 

25 

0 

0 

0 

0 

0 

1.40 

1.74 

The  two  responses,  Yl  and  Y2 ,  were  treated  as  separate  problems  simply  to 


demonstrate  the  ANNs’  ability  to  model  problems  with  combined  array  designs.  Optimal 


settings  were  not  provided  in  Koksoy  (2008),  but  can  be  estimated  by  examining  Table 
13.  Table  14  provides  the  estimated  optimal  robust  settings  for  the  two  problems. 


Table  14.  Estimated  Optimal  Settings  for  Koksoy  Example 


Response 

xl  x2  x3 

Est.  Mean 

Yl 

Y2 

1  0  0 

-1  -1  1 

1.03 

1.07 

Although  both  ANN  methods  were  performed  on  Example  2,  this  problem  is 
more  suitable  for  approach  1 .  Results  are  shown  from  approach  1  which  uses  a  single 
neural  network  with  one  response.  The  mean  and  variance  is  extracted  from  responses  of 
control  settings  across  noise  variables.  Quadratic  regression  was  also  performed  for 
comparison. 
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The  results  obtained  from  the  ANNs  (RBFNNs  and  GRNNs  obtained  the  same 


results)  and  quadratic  regression  for  the  two  responses  are  displayed  in  Table  15.  RPD 
was  performed  to  create  both  a  mean  model  and  variance  model,  thus  calculating  the  LT 
value,  but  only  the  estimated  mean  is  reported  in  Table  15.  The  ANN  achieved  the 
optimal  settings  for  both  problems  even  though  a  small  combined  array  design  was 
implemented.  Quadratic  regression  was  further  from  the  optimal  solution;  this  was 
further  evidence  supporting  the  use  of  ANNs  even  in  the  presence  of  small  data  sets. 


Table  15.  Settings  for  ANN  and  QR  for  Example  2 


Method 

xl  x2  x3 

Est.  Mean 

QR-Y1 

ANN-Y1 

QR-Y2 

ANN-Y2 

1  0  -1 

1  0  0 

-1  -1  -1 

-1  -1  1 

0.972 

1.03 

1.11 

1.07 

3.4.4  Doubly  Robust  Operating  Points  Using  ANNs 

Methodology  was  covered  in  Sections  3.3.2  and  3.3.3  to  guard  against  system 
degradation  by  utilizing  a  gradient  search  in  the  coefficient  space.  Settings  were 
calculated  which  were  robust  against  noise  variables  and  robust  against  perturbations  in 
the  system  causing  performance  degradation.  These  points  are  called  “doubly  robust” 
operating  points.  In  this  section,  gradient  analysis  is  applied  to  ANNs  when  quadratic 
regression  is  unsuitable. 

The  ANNs  developed  in  Section  3.4.1  output  a  mean  and  variance  value  for 
control  settings.  For  gradient  analysis,  it  is  necessary  to  construct  an  ANN  that  outputs 
LT  values  based  on  control  settings  as  inputs.  Therefore,  an  extra  step  is  taken  to 
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transform  the  results  of  the  ANNs  outputting  mean/variance  values  to  construct  a  new 
ANN  (d/7j ,  outputting  LT  values. 

After  obtaining  the  mean  (uy  (x  )j  and  variance  (a2  (x)j  values,  regardless  of  the 
ANN  approach  {pre/post)  utilized,  the  LT  values  are  calculated  for  the  control  settings. 
The  control  settings  become  the  inputs  for  the  new  ANN  (d/7j  with  the  LT  values  as  the 
expected  response.  Gradient  analysis  is  now  appropriate. 


92 


For  quadratic  regression,  the  gradient  of  the  LT  (F)  function  was  found  with 
respect  to  the  coefficients  of  the  mean  model  and  variance  model.  ANNs  are 
parameterized  by  weights  rather  than  regression  coefficients  as  found  in  the  quadratic 
models.  Figure  20  depicts  the  ANN  to  solve  for  the  robust  optimal  settings  (V )  in  the 

design  space  (D)  of  x,  under  nonnal  operating  conditions  ( wimtml  j .  This  involves 
solving: 

x*=argmin  z , x)  (3.34) 

xeD  '  ' 


After  solving  for  the  robust  optimal  settings  in  Equation  (3.34),  a  gradient  search 
is  then  performed.  To  follow  the  gradient  of  an  ANN,  partial  derivatives  of  the  output 


with  respect  to  the  weights, 


f  dz  ^ 


ydW  j 


,  need  to  be  calculated.  Obtaining  these  partial 


derivatives  allows  one  to  follow  the  procedure  outlined  in  Figure  23  where  Vz  = 


f  dz  ^ 


ydWy 


The  original  weights  ( w"'"aI  j  are  used  as  the  starting  point  and  reassigned  as 
a  small  step  size,  E, ,  the  gradient  search  is  written  as: 


w°,d .  For 


Wnew  =  woId 


+£(vz) 


(3.35) 


Following  a  step  in  the  gradient  direction,  the  optimal  control  settings  are  solved: 


**  •  /  new  \ 

x  =argmin  zlw  ,x) 

x<eD 


(3.36) 
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This  process  is  repeated  until  a  preset  percent  increase  in  LT  (z**)  is  realized. 

The  final  result  yields  control  settings  (V  j  that  are  robust  to  noise  variables  and  robust 

to  perturbations  in  the  system  causing  perfonnance  degradation. 

Herein  RBFNNs  and  GRNNs  are  examined  as  neural  network  options.  In  the 
following  section,  the  gradient  vector  is  derived  for  both  the  gradient  of  a  RBFNN  and 
then  a  GRNN. 

Recall  in  Section  2.3.2,  the  formulation  for  RBFNNs  to  calculate  an  output  (z) 

was: 

z=Zvv7exp  +b2  (3-37) 

7=1  V  k= 1 

To  recap,/?  centers  exist  for  n  features  (or  input  variables).  xk  represents  the  Mi 
input  feature/variable  of  the  new  exemplar  and  //I,71  represents  the  kt h  component  of  the 
/th  center.  wj  is  equal  to  the  weight  of  the y'th  node.  The  initial  bias  tenn  (  //  )  represents 
MATLAB’s®  interpretation  of  applying  the  spread  in  the  equation,  which  is  calculated  as 
0.8326/spread.  Also,  a  second  bias  (b2 )  term  is  appended  to  represent  a  linear  layer  bias 

term.  Figure  21  represents  the  RBFNN  as  outlined  in  Equation  (3.37).  The  hp  nodes 
represent  each  hidden  layer  node  in  Equation  (3.37). 
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the  input  x  with  its  output  as  z.  The  initial  bias  term  is  calculated  the  same  as  in  the 
RBFNN.  Figure  22  represents  the  GRNN  outlined  in  Equation  (3.41),  where  a  denotes 
the  numerator  and  /?  denotes  the  denominator. 


Inputs 


Flidden 

Layer 


Summation  Division 

Layer  Layer 


Outputs 


Figure  22.  GRNN  with  Single  Output  (LT) 
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For  gradient  analysis,  Equation  (3.41)  can  be  rewritten  as: 


,  (3-42) 

7=1 

where 


Cj  =exP  -YJ{b)(xk-u[i)))2 


(3.43) 


The  partial  derivative  of  z  with  respect  to  the  weights,  -  ,  is  calculated  as: 

l  8W 


8z 

8W 


(3.44) 


After  obtaining  the  partial  derivatives  for  RBFNNs,  Equation  (3.40),  and  GRNNs, 
Equation  (3.44),  gradient  analysis  can  be  conducted  as  outlined  in  Figure  23  and 
Equations  (3.35)-(3.36). 
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Figure  23.  Algorithm  for  System  Degradation  in  ANNs 
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Recall  that  in  the  Semiconductor  Extended  example  in  Section  3.4.2,  the  system 


response  was  defined  as: 

y(x,z )  =  30.37  -2. 92xj  -4.13x2  +  2.6xf  +  2.18x2  +  2.73zj  -2.33z2 

+2.33z3  -0.27xjZj  +0.89xjZ2  +  2.58xjZ3  +  2.01x2Zj  -1.43x2z2  +1.56x2z3 
+3.8094x4  +  2.163x4  +  2.9954xf  +4.866  lxfx2  +3.4496.x12x2  +2.0059x,  x2 
-0.0085zj2  +3.56z2  +  2.854  1z32  +  1.3269z24  +  2.624z,z2  -4.5689z2z3  +£ 


s  =  Normal  0, 


The  optimal  settings  obtained  for  this  problem  using  RBFNNs  and  GRNNs  was 
[0.24 , 0.48]  with  an  associated  LT  value  of  1382.4.  The  weights  associated  with  these 
ANNs  were  extracted  and  implemented  as  the  starting  point  [  w>mtml  ]  for  the  gradient 


search.  Steps  of  size  .01  in  the  gradient  direction  were  taken  until  a  20  percent 
degradation  in  LT  (z**)  was  realized,  which  computes  as  1659. 

Following  the  algorithm  outlined  in  Figure  23,  doubly  robust  settings  (V*)  were 


calculated.  These  settings  corresponded  to  [0.18,0.58]  for  the  RBFNNs  and  [0.20,0.58] 


for  the  GRNNs.  Since  a  fourth  order  model  was  utilized,  it  was  difficult  to  test  the 
doubly  robust  conditions  under  system  degradation.  For  the  quadratic  regression,  one 
was  able  to  adjust  the  intercept  of  the  mean  model  and  multiply  a  constant  to  the 
variance.  However,  literature  is  scarce  as  to  how  to  obtain  mean  and  variance  models  for 
situations  greater  than  quadratic. 

Therefore,  these  settings  were  tested  against  the  original  settings  under  normal 
operating  conditions  (Equation  (3.45)).  The  solutions  for  the  original  settings  and  their 
actual  LT  values  are  reported  in  Table  12.  This  table  is  extended  to  include  the  doubly 
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robust  settings  and  their  actual  LT  values  under  normal  operating  conditions.  This  is 
displayed  in  Table  16. 


Table  16.  ANN  Doubly  Robust  Settings  under  Normal  Operating  Conditions 


Method 

xl 

x2 

Actual  LT 

Actual 

0.24 

0.48 

1382 

QR 

1 

0.31 

2108 

1 

GRNN 

0.24 

0.48 

1382 

0.18 

0.52 

1383 

ANN  2  RB™N 

GRNN 

0.24 

0.48 

1382 

0.24 

0.48 

1382 

Doubly  RBFNN 

0.18 

0.58 

1386 

Robust  GRNN 

0.2 

0.58 

1390 

The  LT  values  of  the  doubly  robust  settings  for  RBFNNs  and  GRNNs  computed 
to  only  a  0.29  and  0.33  percent  increase,  respectively,  in  expected  LT  value  from  the 
original  settings  under  normal  system  operation.  This  result  indicated  that  selecting 
doubly  robust  settings  maintained  near  optimal  results  under  nonnal  conditions.  As  seen 
with  system  degradation  in  quadratic  regression,  these  settings  should  have  been  more 
robust  to  system  degradation,  within  the  specified  LT  increase  bound.  ANNs  proved 
useful  in  fitting  non-linear  problems  and  their  use  has  been  adapted  to  guard  against 
system  degradation. 


3.5  Multiple  Responses 

Real  world  problems  often  involve  measuring  multiple  responses.  Difficulties 
arise  when  the  optimal  choice  of  settings  for  each  response  are  different.  For  example, 
the  Koksoy  (2008)  example  in  Section  3.4.3  involved  two  responses  and  the  optimal 
settings  for  each  response  were  on  different  ends  of  the  spectrum  in  tenns  of  settings. 
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This  issue  becomes  increasingly  difficult  given  more  response  variables.  Section  2.4 
discussed  approaches  to  multiple  response  problems,  but  most  involve  subject  matter 
expertise  for  weighting  or  subjective  decision  making  based  on  contour  plots.  To  account 
for  this  problem,  factor  analysis  (FA)  is  implemented  to  discover  if  the  responses  can  be 
projected  into  a  meaningful  subspace. 

3.5.1  Factor  Analysis  for  Multiple  Responses 

Traditionally,  factor  analysis  is  performed  on  features  to  detennine  commonalities 
for  feature  reduction.  In  this  research,  the  same  idea  is  applied  to  the  responses  rather 
than  the  features  (input  variables).  Applying  factor  analysis  reduces  the  number  of 
responses  to  reflect  common  factors  (responses).  The  factor  scores  generated  represent 
the  new  response  variable(s).  Furthermore,  reduction  to  a  single  factor  allows  quadratic 
regression  or  ANNs  to  be  applied  to  find  the  optimal  settings  for  control  variables. 

However,  factor  analysis  may  only  reduce  the  problem  to  two  or  more  factors 
rather  than  a  single  factor.  The  number  of  responses  may  have  been  reduced,  but  the 
issue  of  multiple  responses  remains.  To  combat  this  problem,  linear  combination 
techniques  are  employed  to  combine  factor  scores  from  multiple  dimensions  into  a  single 
dimension.  Rotated  factor  scores  were  also  created  using  the  MATLAB®  function 
r otatef actor s . 

The  simplest  method  to  combine  the  multiple  (rotated)  factor  scores  is  by 
addition/sub  traction.  Signs  are  attached  to  the  factor  scores  appropriately  to  minimize  the 
overall  response.  For  example,  if  a  high  factor  score  is  desirable  for  a  particular  factor,  a 
negative  sign  is  placed  on  the  factor.  Equation  (3.46)  summarizes  this  technique  for  n 
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factors,  where/  represents  the  factor  score  for  factor  i  and  /' R 1  represents  the  rotated 
factor  score  for  rotated  factor  i  as: 


?=£/,  or  (346) 

(=1  i=\ 

The  second  linear  combination  technique  applies  weights  to  the  (rotated)  factor 
scores.  As  opposed  to  subjective  weights,  eigenvalues  contain  information  on  factor 
importance.  The  eigenvalues  are  nonnalized  to  represent  a  value  between  0  and  1.  Each 
(rotated)  factor  is  then  multiplied  by  its  normalized  eigenvalue  to  produce  weighted 
(rotated)  factor  scores.  Equation  (3.47)  depicts  this  method  where  /  represents  the 
weight  (normalized  eigenvalue)  for  factor  i  as: 


r  =  or  =  (3.47) 

;=i  i=i 

The  third  method  considers  adjusting  the  (rotated)  factor  scores  to  reflect  the  same 
scale.  To  achieve  this,  the  (rotated)  factor  scores  are  normalized.  Equation  (3.48) 
outlines  methodology  to  normalize  a  particular  (rotated)  factor  score,  j,  within  factor  i  as: 


max  (/)- mill)/;) 

This  normalized  score  is  then  added/subtracted  similarly  to  method  one  (Equation 
(3.46)).  The  %  symbol  represents  the  nonnalized  (rotated)  factor  score: 


(3.48) 


y=Tf"‘  »r 


1=1 


(3.49) 
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The  final  method  combines  methods  two  and  three  into  a  single  statistic. 
Normalized  (rotated)  factor  scores  are  calculated  and  these  new  scores  are  weighted  by 
their  normalized  eigenvalues,  as  shown  in  Equation  (3.50): 

y  =  ZV'"  or  r  =  iuu‘">  (3.50) 

i= 1  i= 1 

Table  17  summarizes  the  methods  created  using  factor  analysis.  These  methods 
are  only  applied  if  factor  analysis  reduces  the  problem  to  two  or  more  factors.  If  factor 
analysis  suggests  the  use  of  a  single  factor,  these  linear  combination  methods  become 
unnecessary.  This  single  dimension  problem  pennits  the  use  of  quadratic  regression  or 
ANNs  to  compute  optimal  robust  settings  or  doubly  robust  settings. 
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Table  17.  Summary  of  (Rotated)  Factor  Score  Reductions 


Sum  Factor  Scores 

Sums  factor  scores  appropriately 

ll 

Sum  Weighted  Factor  Scores 

Weight  factors  by  the  eigenvalues  (normalized  to  1)  and  sum 

appropriately 

W 

ll 

Sum  Normalized  Factor  Scores 

Normalize  factor  score  and  sum  appropriately 

X® 

o'- 

II 

Sum  Weighted  Norm. 

Factor  Scores 

Normalize  factor  scores,  weight  factors  by  eigenvalue,  and  sum 

appropriately 

i= 1 

Sum  Rot.  Factor  Scores 

Sums  rotated  factor  scores  appropriately 

Y=i/r 

i= 1 

Sum  Weighted 

Rot.  Factor  Scores 

Weight  rotated  factor  scores  by  the  eigenvalues  (normalized) 
and  sum  appropriately 

(=1 

Sum  Normalized 

Rot.  Factor  Scores 

Normalize  rotated  factor  scores  and  sum  appropriately 

II 

Sum  Weighted  Norm. 

Rot.  Factor  Scores 

Normalize  rotated  factor  scores,  weight  rotated  factors  by  eigenvalue, 

and  sum  appropriately 

w 

ll 
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3.5.2  Factor  Analysis  Problem 

To  demonstrate  factor  analysis  on  a  multiple  response  problem,  a  simple  five 
response  problem  was  created.  The  five  responses,  yny2,y3,y4,y5 ,  required  the  first 

three  minimized  and  last  two  maximized.  The  problem  was  constructed  to  reflect 
different  optimal  solutions  among  several  responses.  Two  control  variables  and  two 
noise  variables  were  utilized.  To  re-sample  data  for  each  response,  a  32  x  22  crossed 
array  design,  resulting  in  36  runs,  was  implemented.  An  error  term  of 

s  ~  Normal  ^0,  V0.9  j  was  applied  to  each  model  to  allow  for  variation  in  re-sampling  of 

data.  Equations  (3.5 1)-(3.55)  represent  the  true  regression  models  for  the  five  responses. 

y,(x,z)  =  4.2  +  1.21xj  -  0.92x2  -0.05zj  +  0.07z2  +1.1  lx\  +  0.88.x2 
+  1.97XjX2  +0.35xjZj  -0.54xjZ2  -0.22x2Zj  -0.85x2z2  +e 

v2(x,z)  =  4  +  1.2xj  -  1.00x2  +  0.05zj  +  0.1z2  +  l.OOx2  +  1.00.x2 
+2X[X2  +  0.55xizl  -  0. 65xjZ2  -  0.  18x2Zj  -  0.90.x2z2  +  s 

y3(x,z)  =  20  +  2.4xj  -2.1x2  +  l.OOzj  +  0.79z2  +2.79x2  -1.66x2 
+0.94XJX2  + 1  .OOxjZj  +0.0  1xjZ2  -1.3  lx2Zj  +0.1  lx2z2  +  s 

v4  (x,  z)  =  2 1  -  4.2xj  -  2x2  -  0.60zj  +  0.90z2  + 1 .  lx2  +  0.90x2 
+0.02xjX2  -  0.06xjZ2  +  0.1  lx2Zj  +0.1  5x2z2  +  e 

y5(x,z )  =  19-3. 9xt  -2x2  +1.2 lz4  -1.57z2  -1.22x!2  +1.00x2 
+2.03xjX2  -0.88xjZj  +0.84xjZ2  -1.2x2z1  +s 

The  true  optimal  robust  settings  for  each  response  were  calculated  by  using  a  full 

five  level  factorial  design.  This  design  was  replicated  100  times  to  collect  a 

generous  amount  of  data  for  accuracy.  A  mean  and  variance  value  was  calculated  across 


(3.51) 

(3.52) 

(3.53) 

(3.54) 

(3.55) 
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all  noise  variable  combinations  for  each  unique  control  setting  combination.  Finally,  an 
LT  value  was  computed  based  on  the  response  being  a  minimum  or  maximum.  The 
minimum  LT  value  was  selected  in  all  instances,  with  the  corresponding  control  settings 
representing  the  optimal  robust  settings.  The  true  optimal  settings  for  each  response 
problem  is  given  in  Table  18. 


Table  18.  Optimal  Settings  for  Five  Response  Problems 


Response 

xl 

x2 

Optimal  LT 

Y1 

-1 

1 

5.9 

Y2 

-1 

1 

4.33 

Y3 

-0.5 

1 

237.18 

Y4 

-1 

-1 

-855.48 

Y5 

-1 

-1 

-695.8 

Individually,  the  settings  for  the  first  three  responses  were  similar.  Also,  the  final 
two  responses  possessed  identical  optimal  robust  settings.  Quadratic  regression  and 
ANNs  were  applied  to  the  five  responses  individually  to  demonstrate  their  capability  in 
accurately  modeling  the  problems.  As  a  side  note,  RBFNNs  and  GRNNs  obtained 
similar  results.  Table  19  displays  the  results,  which  show  accurate  modeling  was 
performed  using  either  technique.  This  was  expected  for  quadratic  regression  because 
the  true  models  are  quadratic  in  nature. 


Table  19.  Settings  from  QR  and  ANN  for  Five  Responses 


Response 

Quadratic  Regression 
xl  x2 

ANNs 

xl  x2 

Y1 

Y2 

Y3 

Y4 

Y5 

-1  1 

-1  1 

-0.5  1 

-1  -1 

-1  -1 

-1  1 

-1  1 

-0.5  1 

-1  -1 

-1  -1 

Factor  analysis  was  applied  to  the  five  responses.  A  factor  loadings  matrix  was 
constructed  to  determine  which  responses  could  be  grouped  together.  Table  20  displays 
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the  factor  loadings  matrix  which  shows  two  important  factors.  The  first  factor  grouped 
the  minimized  responses  while  the  second  grouped  the  maximized  responses.  These  two 
factors  explained  92%  of  the  variance. 


Table  20.  Factor  Loadings  Matrix 


Factor  1 

Factor  2 

Y1 

0.9224 

0.3041 

Y2 

0.9179 

0.3205 

Y3 

0.8017 

0.0741 

Y4 

-0.4531 

0.7675 

Y5 

-0.3422 

0.8367 

The  MATLAB®  function,  r otatef actors,  was  applied  to  this  data  to  determine  the 
rotated  factors  loadings  matrix.  Three  different  factors  were  suggested  when  rotating  the 
factors.  The  first  two  responses  was  Factor  1,  the  maximized  responses  were  Factor  2, 
and  the  third  minimized  response  corresponded  to  its  own  factor.  Table  21  displays  the 
rotated  factors  loadings  matrix  and  the  corresponding  responses. 


Table  21.  Rotated  Factors  Loadings  Matrix 


Rot.  Factor  1 

Rot.  Factor  2 

Rot.  Factor  3 

Y1 

0.9613 

-0.0691 

-0.2503 

Y2 

0.9601 

-0.0514 

-0.2561 

Y3 

0.4202 

-0.1395 

-0.8861 

Y4 

-0.1767 

0.8968 

-0.0113 

Y5 

0.0645 

0.8868 

0.1803 

Factor  analysis  reduced  the  five  responses  into  a  more  manageable  two  or  three 


response  problem  depending  on  whether  rotation  was  utilized.  The  issue  of  multiple 


responses  remained  evident.  Different  techniques  to  handle  problems  with  more  than  one 


factor  were  give  in  Table  17.  All  eight  techniques  were  applied  for  comparison. 


Quadratic  regression  and  ANNs  were  applied  to  the  (rotated)  factor  scores  for 


modeling.  This  approach  differed  from  examining  factor  plots  to  choose  optimal  settings. 
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For  simplicity  purposes,  a  step  size  of  .5  was  utilized  on  each  control  variable.  The 
results  are  summarized  in  Table  22,  including  “General”  which  simply  added/subtracted 
the  standardized  response  data  (Davis,  2009).  The  table  reports  the  factor  analysis  method 
performed,  its  mathematical  notation,  and  the  optimal  settings  obtained  through  quadratic 
regression  and  ANNs. 


Table  22.  Robust  Solutions  for  (Rotated)  Factor  Reduced  Methods 


Method 

Math 

Quad.  Reg. 

ANNs 

XI  X2 

XI  X2 

General 

7^1  — I-  ^^2  3  -  ^^4  -  -^^5 

0.5 

0.5 

0.5 

0.5 

Sum  Factors 

Fx-F2 

-1 

-1 

-1 

-1 

Sum  Weighted 
Factors 

~  ^2^2 

-1 

1 

-1 

1 

Sum  Norm. 

Factors 

F°/o  -  F°/o 

-1 

-1 

-1 

-1 

Sum  Weighted 
Norm.  Factors 

\f;/o  -  a2f2% 

-1 

1 

-1 

1 

Sum  Rot.  Factors 

zt’(^)  zt’(^)  zt’(^) 

*  1  ■*-  2  ±  3 

-1 

1 

-1 

1 

Sum  Weighted 
Rot.  Factors 

-  A2F^  -  AjF3iR) 

-1 

1 

-1 

1 

Sum  Norm.  Rot. 

Factors 

z 7(*%)  z^(*%) 

A  \  ±  2  *3 

-1 

1 

-1 

1 

Sum  Weighted 
Nomi.  Rot. 

Factors 

Alrl  ^2-^2 

-1 

1 

-1 

1 

Quadratic  regression  and  ANN  were  identical  in  results  obtained.  All  of  the 
results  with  exception  of  “General”,  summing  the  factor  scores,  and  summing  the 
nonnalized  factor  scores  suggested  robust  settings  of  [-1  ,  1].  To  determine  how  “good” 
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these  solutions  were,  the  true  LT  values  for  each  response  was  examined  given  each 


particular  combination  of  settings,  given  in  Table  23.  The  highlighted  cell  in  each 


response  column  represents  the  optimal  LT  for  that  particular  response,  as  shown  in 


Table  18. 


Table  23.  LT  Values  for  Factor  Analysis  Problem 


xl 


x2 


Y1 


Y2 


Y3 


Y4 


Y5 


-1 

-1 

-1 

-1 

-1 

-0.5 

-0.5 

-0.5 

-0.5 

-0.5 

0 

0 

0 

0 

0 

0.5 

0.5 

0.5 

0.5 

0.5 

1 

1 

1 

1 

1 


-1 

-0.5 

0 

0.5 

1 

-1 

-0.5 

0 

0.5 

1 

-1 

-0.5 

0 

0.5 

1 

-1 

-0.5 

0 

0.5 

1 

-1 

-0.5 

0 

0.5 

1 


78.61 

42.18 
21.27 

10.50 
5.90 
56.09 

31.95 
18.59 
12.24 

10.51 
45.22 

29.51 
21.64 
19.42 
22.06 
43.32 

34.19 

31.75 
35.38 
45.92 

49.96 
47.58 

52.51 

65.75 
89.71 


73.99 

37.67 

17.73 

8.07 

4.33 

53.44 
28.81 
15.75 
9.89 
8.61 
43.06 

26.59 
18.61 
16.49 
19.36 

40.60 
30.48 

27.52 
30.82 
41.27 

45.52 
41.71 

45.44 

57.60 
80.79 


481.43 

468.17 
421.02 
345.55 
251.48 
424.91 

421.37 
385.28 
321.02 

237.18 
428.35 
433.90 
406.12 
348.23 
267.61 
492.01 
507.99 
487.78 

433.38 
350.98 
628.01 
657.77 
646.34 
594.52 
507.33 


-855.48 

-759.50 

-692.68 

-651.79 

-634.68 

-692.08 

-606.28 

-546.99 

-510.89 

-495.92 

-572.06 

-494.47 

-441.26 

-409.18 

-396.00 

-487.45 

-416.28 

-367.77 

-338.68 

-326.88 

-432.27 

-365.55 

-320.38 

-293.43 

-282.65 


-695.80 

-559.49 

-459.23 

-388.05 

-340.73 

-595.19 

-492.14 

-418.94 

-370.30 

-342.40 

-475.09 

-403.77 

-356.60 

-329.74 

-321.11 

-345.12 

-302.19 

-278.11 

-270.60 

-279.02 

-217.37 

-197.54 

-191.76 

-199.21 

-220.71 


Table  23  contains  LT  values  based  on  different  ranges  depending  on  the  response. 


To  account  for  this,  a  percentage  from  optimal  value  was  taken  on  each  response.  This 


was  calculated  by  taking  each  LT  value  (in  a  particular  response),  and  determined  the 
percentage  distance  this  value  was  from  the  optimal  value  (highlighted  cell).  Table  24 


displays  the  percentage  from  optimal  values.  A  column  is  appended  on  Table  24  to 


represent  the  average  percentage  distance  a  particular  setting  is  from  the  optimal  values. 
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Finally,  this  Table  24  is  sorted  on  this  column  to  represent  the  suggested  true  optimal 
settings  for  this  problem. 


Table  24.  Percentage  From  Optimal  LT  Values 


xl 

x2 

_ 

Y1 

Y2 

Y3 

Y4 

Y5 

Average 

-1 

1 

0.00 

0.00 

6.03 

25.81 

51.03 

16.57 

-0.5 

1 

98.95 

0.00 

42.03 

50.79 

53.97 

-1 

0.5 

86.30 

45.69 

23.81 

44.23 

55.59 

-0.5 

0.5 

107.42 

128.39 

35.35 

40.28 

46.78 

71.64 

-0.5 

0 

215.16 

263.77 

62.44 

36.06 

39.79 

123.44 

0 

0.5 

229.10 

280.74 

46.82 

52.17 

52.61 

132.29 

-1 

0 

260.48 

309.40 

77.51 

19.03 

34.00 

140.08 

0 

1 

273.89 

347.08 

12.83 

53.71 

53.85 

148.27 

0 

0 

266.80 

329.74 

71.23 

48.42 

48.75 

152.99 

0 

-0.5 

400.16 

514.19 

82.94 

42.20 

41.97 

216.29 

-0.5 

-0.5 

441.49 

565.46 

77.66 

29.13 

29.27 

228.60 

0.5 

0 

438.07 

535.49 

105.66 

57.01 

60.03 

239.25 

0.5 

-0.5 

479.46 

604.04 

114.18 

51.34 

56.57 

261.12 

0.5 

0.5 

499.73 

611.79 

82.72 

60.41 

61.11 

263.15 

-1 

-0.5 

614.86 

769.93 

97.39 

11.22 

19.59 

302.60 

0.5 

-1 

634.19 

837.57 

107.44 

43.02 

50.40 

334.52 

0.5 

1 

678.29 

853.07 

47.98 

61.79 

59.90 

340.21 

0 

-1 

666.49 

894.48 

80.60 

33.13 

31.72 

341.28 

1 

-0.5 

706.38 

863.20 

177.33 

57.27 

71.61 

375.16 

1 

-1 

746.74 

951.30 

164.78 

49.47 

68.76 

396.21 

1 

0 

790.05 

949.46 

172.51 

62.55 

72.44 

409.40 

-0.5 

-1 

850.74 

1134.12 

79.15 

19.10 

14.46 

419.51 

1 

0.5 

1014.47 

1230.23 

150.66 

65.70 

71.37 

506.49 

-1 

-1 

1232.44 

1608.80 

102.98 

0.00 

0.00 

588.84 

1 

1 

1420.53 

1765.87 

113.90 

66.96 

68.28 

687.11 

According  to  the  results  of  the  final  column  in  Table  24,  it  was  suggested  that  the 


settings  [-1  ,  1]  were  the  true  optimal  settings  for  all  five  responses.  Using  these  settings 


achieved  the  least  difference  from  the  optimal  solutions  for  any  response.  According  to 


Table  22,  most  of  the  derived  factor  analysis  linear  combinations  determined  these  exact 


results. 


The  un- weighted  factor  scores,  settings  corresponding  to  [-1,-1],  were  the  only 
methods  unable  to  obtain  the  optimal  solution.  This  was  most  likely  due  to  the  fact  that 
responses  4  and  5  should  not  have  been  held  at  the  same  weight  as  the  other  three. 
Finally,  the  “General”  settings,  [0.5 , 0.5],  corresponded  to  the  “middle  of  the  road” 
suggested  settings.  For  this  example,  using  this  method  never  achieved  an  optimal 
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solution  for  any  response,  but  attempted  to  avoid  performing  consistently  bad  in  any 
situation. 

Although  factor  analysis  was  unable  to  reduce  the  five  responses  into  a  single 
dimension,  the  linear  combination  techniques  proved  useful.  Therefore,  it  is  suggested 
the  use  of  factor  analysis  (and  the  linear  combination  techniques)  as  a  mathematical 
approach  to  reducing  high  dimension  response  problems  into  a  single  dimension  is 
superior  to  simply  summing  standardized  response  data  (Davis,  2009). 
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IV.  Application 


4.1  Overview 

In  Chapter  3,  research  methodology  was  discussed.  Examples  of  three  different 
areas  of  research  relating  to  robust  parameter  design  were  presented.  The  first  area 
involved  a  search  for  new  control  variable  settings  that  guard  against  system  degradation. 
The  second  area  utilized  artificial  neural  networks  rather  than  quadratic  regression  to  fit 
highly  nonlinear  problems.  Finally,  factor  analysis  was  implemented  to  reduce  the 
dimensionality  of  multiple  response  problems. 

These  three  techniques  were  applied  to  a  computer  algorithm  developed  by 
Johnson  (2008).  The  computer  algorithm,  an  autonomous  global  anomaly  detector 
known  as  AutoGAD,  has  demonstrated  usefulness  in  locating  targets  (anomalies)  in 
hyper-spectral  imagery  (HSI).  AutoGAD  is  currently  employed  using  control  settings 
suggested  by  Johnson,  which  were  derived  through  experience.  This  research  detennined 
more  robust  (and  doubly  robust)  settings  than  those  currently  implemented  or  previously 
researched,  as  in  Davis  (2009). 

A  background  on  hyper-spectral  imagery,  an  explanation  of  AutoGAD,  and 
results  obtained  from  applying  the  new  techniques  to  AutoGAD  will  be  provided  in  this 
chapter. 

4.2  Hyper-spectral  Imagery 

Hyper-spectral  images  are  taken  of  an  object  or  area  of  interest  much  like  a  digital 
photograph.  The  primary  difference  between  the  hyper-spectral  images  and  digital 
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photographs  lies  within  the  region  where  the  electromagnetic  (EM)  spectrum  of  the 
image  is  taken.  The  EM  spectrum  is  displayed  in  Figure  24  for  reference.  The  typical 
photograph  produced  from  a  common  camera  uses  the  visible  part  of  the  spectrum.  This 
consists  of  a  small  number  of  bands;  possibly  just  one  band  if  dealing  with  black  and 
white  photographs.  HSI  utilizes  the  region  from  ultraviolet  to  infrared,  as  highlighted  in 
Figure  25.  The  highlighted  area  consists  of  hundreds  of  bands,  thus  allowing  for  more 
information  to  be  obtained  about  the  object/area  of  interest. 
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Figure  24.  Electromagnetic  Spectrum  (Pabich,  2002) 
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Figure  25.  HSI  range  in  EM  spectrum  (Landgrebe,  2003) 

The  capabilities  of  the  HSI  sensors  are  presented  in  detail  by  Pabich  (2002).  HSI 
sensors  collect  some  fonn  of  reflected  natural  light,  such  as  sunlight,  from  different 
objects.  The  energy  of  the  reflected  light  is  summarized  into  wavelength  bins  of  the  EM 
spectrum.  Information  given  by  the  reflectance  of  objects  allows  for  detection  and 
identification. 

Once  the  images  are  taken,  a  three  dimensional  HSI  data  cube  is  constructed,  as 
seen  in  Figure  26.  Viewing  the  cube  from  the  spatial  dimensions  (two  dimensional:  i  and 
j)  is  the  same  as  viewing  a  photograph  on  a  piece  of  paper.  The  spectral  dimension  ( k ) 
acts  much  like  a  stack  of  photographs  on  a  table;  each  photograph  is  the  same  image  but 
represents  a  different  band  of  the  EM  spectrum.  A  representation  of  the  spectral 
dimension  ( k )  of  a  data  cube  is  given  in  Figure  27. 
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Figure  26.  HSI  Cube  Example  (Shaw  et  al.,  2002) 


Figure  27.  Layers  of  Data  in  Spectral  Dimension  (Miller,  2009) 


To  analyze  the  HSI  data  cube,  the  data  in  each  image  is  converted  into  a  two- 
dimensional  matrix  (Smetek,  2007).  Suppose  data  from  a  hyper-spectral  image  that  is 
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100  pixels  wide  by  75  pixels  long  contains  300  bands.  This  results  in  7500  pixels  with 
300  spectral  data  points.  Each  pixel  across  the  300  spectral  bands  (7500  pixels)  is 
converted  into  a  single  column  in  a  new  data  matrix.  The  final  data  matrix  size  is  300 
rows  by  7500  columns,  where  each  column  represents  a  single  pixel  in  the  original  image 
across  all  300  spectral  bands.  This  matrix  is  transposed  to  perform  multivariate  analysis. 
The  process  of  transforming  the  HSI  images  into  a  matrix  that  can  be  utilized  in  numeric 
calculations  (Miller,  2009)  is  depicted  in  Figure  28. 
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Figure  28.  Transforming  HSI  Cube  into  Data  (Miller,  2009) 
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4.3  AutoGAD 


Johnson  (2008)  developed  an  autonomous  global  anomaly  detector  (AutoGAD) 
algorithm  which  provides  information  on  the  location  of  possible  targets  (anomalies)  in 
real  time.  This  computer  algorithm  utilizes  the  data  matrix  obtained  from  HSI  data  cubes 
to  quickly  and  accurately  locate  possible  targets  within  an  image  (Johnson,  2008;  Davis, 
2009;  Miller,  2009).  AutoGAD  is  written  in  MATLAB®  and  inputs  HSI  data  to  provide 
a  resultant  image  of  where  targets  are  located.  For  testing  purposes,  AutoGAD  has  the 
capability  to  take  “truth”  images  to  determine  the  performance  of  the  algorithm.  A  small 
discussion  is  provided  on  how  AutoGAD  works,  and  for  a  more  complete  understanding, 
the  reader  is  referred  to  Johnson  (2008). 

After  converting  the  HSI  data  into  a  two-dimensional  matrix,  as  seen  in  Section 
4.2,  the  algorithm  was  employed  to  assist  the  user  in  finding  targets  within  the  images,  as 
depicted  in  Figure  29  (Johnson,  2008). 


Figure  29.  AutoGAD  Algorithm 

The  first  step  applied  principal  components  analysis  (PCA)  to  the  data  in  order  to 
reduce  the  dimensionality.  Once  the  dimensionality  was  reduced,  the  data  was  centered 
and  scaled  around  zero  with  unit  variance,  which  is  known  as  whitening.  Johnson  (2008) 
proposed  a  Maximum  Distance  Secant  Line  (MDSL)  to  ascertain  the  amount  of  variance 
that  should  be  retained  in  the  dimensionality  reduction  following  the  whitening  stage. 
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The  second  step  sent  the  reduced  data  matrix  through  a  process  known  as 
independent  components  analysis  (ICA).  ICA  is  not  discussed  here  and  the  reader  is 
referred  to  Johnson  (2008)  and  its  appropriate  references. 

Step  three,  feature  selection,  determined  which  objects  were  possible  targets.  A 
key  assumption  of  AutoGAD  is  that  the  targets  are  rare  in  occurrence  and  are  truly 
anomalies.  Histograms  were  constructed  to  determine  the  frequency  of  the  potential 
targets.  Also,  signal-to-noise  ratio  (SNR)  statistics  were  employed  to  distinguish  the 
targets  from  background  noise.  Thresholds  were  set  to  detennine  the  pixels  that  fell  out 
of  the  range  of  the  background  pixels.  Miller  (2009)  extended  this  work  which  refined 
the  thresholds  for  faster  and  better  classification  of  potential  targets. 

The  final  step  identified  which  pixels  were  indeed  targets.  If  some  target  pixels 
were  very  close  to  what  is  referred  to  as  the  “zero  bin,”  an  iterative  adaptive  noise  (IAN) 
filtering  technique  was  utilized  by  the  algorithm  for  better  distinguishing  between  the 
targets  and  background. 

4.3.1  AutoGAD  Outputs 

The  typical  output  for  AutoGAD  is  a  dark  image  with  highlighted  spots  in 
different  colors  which  represent  the  targets,  or  anomalies.  Then,  the  user  can  distinguish 
where  these  targets  were  located  by  comparing  the  outputted  images  to  the  original 
images  inputted  into  AutoGAD. 

Alternatively,  if  the  user  possesses  a  “truth  mask”  to  accompany  the  images,  four 
outputs  can  be  obtained  from  AutoGAD  to  determine  the  capability  of  the  algorithm  in 
detecting  anomalies  within  images.  The  first  output  is  “time,”  which  is  measured  in 
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seconds.  This  reflects  the  amount  of  time  it  takes  for  the  algorithm  to  complete  its 
“search”  through  the  image.  “True  Positive  Fraction”  (TPF)  is  the  second  output  which 
compares  how  well  AutoGAD  predicted  detecting  the  anomalies  against  the  “truth 
mask.”  This  value  is  obtained  by  taking  the  ratio  of  the  number  of  pixels  that  AutoGAD 
correctly  called  targets  (“T”)  to  the  number  of  real  target  pixels  (T)  in  the  truth  image 
or  P("T"/  T )  .  The  truth  value  is  in  the  denominator  while  the  percentage  of  correct 
pixels  identified  by  AutoGAD  is  in  the  numerator.  TPF  will  always  be  a  value  between 
zero  and  one.  The  third  output,  “False  Positive  Fraction”  (FPF),  is  computed  by  dividing 
the  number  of  pixels  incorrectly  labeled  targets  (“T”)  by  the  true  number  of  non-target 
pixels  (F)  or  P("T”/ F )  .  Again,  this  value  ranges  between  zero  and  one.  Finally, 

“Target  Fraction  Percent”  (TFP)  is  the  ratio  of  true  positives  to  the  sum  of  true  positives 
and  false  positives. 

The  objective  of  the  algorithm  is  to  accurately  detect  all  anomalies  in  a  quick 
manner  (Johnson,  2008).  Within  AutoGAD,  different  settings  have  to  be  selected  by  the 
user  which  influences  the  four  outputs  of  the  algorithm.  Specific  combinations  of 
settings  cause  increases/decreases  in  detection  performance  as  well  as  processing  time  for 
different  images.  Different  images  also  have  an  effect  on  the  outputs  as  the  optimal 
combination  of  settings  for  a  particular  image  is  not  uniform  for  all  images.  Thus,  robust 
parameter  design  (RPD)  is  desired  to  determine  the  best  settings. 

4.3.2  AutoGAD  Control  Variables 

AutoGAD  contains  eleven  controllable  variables  that  need  to  be  set  prior  to 
running  the  algorithm.  Table  25  displays  the  control  variable  name,  type,  and  range.  For 
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detail  on  how  each  control  variable  works  in  AutoGAD,  the  reader  is  referred  to  Johnson 


(2008). 


Table  25.  Control  Variables  in  AutoGAD 


Control  Variable 

Type 

Range 

Dimension  Adjust 

Discrete 

[-2,2] 

Max  Score  Threshold 

Continuous 

[6 , 14] 

Bin  Width  SNR 

Continuous 

[0.01,0.1] 

PT  SNR  Threshold 

Continuous 

[1,6] 

Bin  Width  Identify 

Continuous 

[0.01,0.1] 

Smooth  Iterations  High 

Discrete 

[50 , 150] 

Smooth  Iterations  Low 

Discrete 

[5 , 45] 

Low  SNR 

Continuous 

[4 , 14] 

Window  Size 

Discrete 

[3,11] 

Threshold  Both  Sides 

Discrete 

[0,1] 

Clean  Signal 

Discrete 

[0,1] 

4.3.3  AutoGAD  Noise  Variables 

AutoGAD  was  a  good  candidate  for  RPD  application  not  only  because  of  the 
eleven  control  variables,  but  also  because  of  the  possibility  of  noise  in  the  system.  Davis 
(2009)  first  attempted  to  capture  the  noise  within  AutoGAD  by  stating  the  images 
themselves  were  noise.  This  was  a  reasonable  suggestion  since  it  is  unknown  what  image 
will  be  sent  through  AutoGAD  for  detection  purposes.  Davis  suggested  that  the  noise 
variables  were  categorical.  Although  research  exists  for  dealing  with  categorical  noise 
variables  (Brenneman  &  Myers,  2003),  constructing  the  mean  and  variance  models 
became  complicated  and  involved  prior  probabilities.  Here,  an  alternative  method  of 
modeling  noise  was  considered  in  which  continuous  noise  variables  were  implemented. 

Three  new  noise  variables  were  constructed  for  the  AutoGAD  algorithm  by 
Mindrup  et  al.  (2010).  The  noise  variables  required  a  “truth  mask”  to  determine  the 
appropriate  output  values  for  an  image.  The  percentage  of  target  pixels  within  each 
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image  was  considered  as  the  first  noise  variable.  This  was  calculated  by  taking  the  ratio 
of  number  of  target  pixels  to  the  number  of  background  pixels. 

A  Fisher’s  ratio  was  calculated  for  the  second  noise  variable.  Lohninger  (1999) 
described  the  Fisher’s  ratio  as  a  measure  for  the  discriminating  power  of  a  particular 
variable.  It  attempted  to  portray  the  overlap  of  two  distributions  through  mean  ( //)  and 

variance  (cr2 ) .  Class  1  was  considered  the  target  class  and  class  2  was  the  background 


class,  as  shown  in  Equation  (4. 1): 


/  = 


2  2 

Mi  ~M2 

2  2 

<JX  +  cr2 


(4.1) 


The  final  noise  variable  considered  was  the  number  of  clusters  in  a  given  image. 
AutoGAD  employed  an  X-means  clustering  algorithm  recommended  by  Williams  (2007) 
which  determined  the  number  of  clusters  by  partitioning  observations  into  different 
clusters  based  on  their  mean  values.  These  three  noise  variables  were  developed  to 
provide  characteristics  of  an  image  rather  than  examining  the  image  as  a  categorical 
variable.  All  three  noise  variables  developed  were  continuous  in  nature  allowing  for  a 
traditional  RPD. 


4.3.4  AutoGAD  Setup  Summary 

A  RPD  was  performed  on  AutoGAD  to  detennine  settings  for  the  eleven  control 
variables  based  on  the  three  suggested  noise  variables.  Eight  images,  as  well  as  their 
“truth  masks,”  were  provided  to  the  author  for  research  on  AutoGAD.  To  obtain  more 
data,  each  of  the  eight  images  was  divided  in  half  allowing  for  16  images  to  be  analyzed. 
When  performing  RPD,  eight  half-images  were  used  for  training  (quadratic  regression  or 
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artificial  neural  networks)  and  the  other  eight  for  testing,  verification,  and  validation. 

This  method  allowed  for  a  holdout  set  which  detennined  how  results  would  look  if  the 
RPD  settings  were  implemented  on  images  never  before  seen,  thus  providing  more  real- 
life  scenario  results. 

This  method  differed  from  Johnson  (2008)  and  Davis  (2009)  who  each  developed 
control  variable  settings  based  on  knowledge  of  all  16  half-images.  This  can  color  the 
results  since  no  new  infonnation  was  presented  in  AutoGAD  to  determine  its  robustness 
to  new  images.  This  research  combated  this  problem  through  utilizing  the  holdout 
method  discussed  above. 

4.4  AutoGAD  and  RPD 

As  explained  in  Section  4.3.1,  AutoGAD  contained  four  different  output  values 
for  detection:  Time,  TPF,  FPF,  and  TFP.  This  constituted  a  four  response  problem  in 
which  traditional  quadratic  regression  and  ANNs  were  applied  to  the  AutoGAD  data  sets 
to  derive  mean  and  variance  models  for  each  output.  These  results  were  then  compared 
to  demonstrate  the  benefit  of  using  ANNs  when  quadratic  regression  fails  to  model  the 
problem  appropriately. 

As  will  be  shown,  all  four  responses  did  not  share  the  same  combination  of 
optimal  robust  settings.  This  warranted  the  use  of  RPD  with  multiple  responses  to 
combine  the  four  responses  into  a  single  dimension.  A  discussion  is  provided  in  Section 

4.5  on  the  techniques  used  to  formulate  a  model(s)  which  considered  all  four  responses 
simultaneously  in  AutoGAD. 
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4.4.1  Auto  GAD  RPD  Design 


To  conduct  a  RPD,  an  appropriate  design  was  constructed  to  intelligently  collect 
data  from  AutoGAD.  A  full  factorial  design  with  1 1  control  variables  and  three  noise 
variables  was  too  large  of  a  design  to  choose  due  to  the  time  required  to  collect  data.  As 
shown  in  the  example  problems  of  Chapter  3,  combined  array  designs  were  appropriate 
when  constructed  correctly.  Two  methods  of  performing  ANNs  that  involved  “pre¬ 
processing”  and  “post-processing”  were  presented  in  Section  3.4.1.  The  former  required 
a  crossed  array  design  to  determine  appropriate  mean  and  variance  values  across  noise 
settings.  Therefore,  a  central  composite  design  (CCD)  on  the  control  variables  was 
crossed  with  a  23  factorial  design  on  the  noise  variables.  The  resultant  design  contained 
2160  CCD  runs  crossed  with  eight  noise  runs  resulting  in  17280  design  points. 

In  the  CCD,  nine  of  the  control  variables  were  varied  over  five  levels:  One  center 
point,  one  at  the  plus  and  minus  factorial  points,  and  one  at  the  plus  and  minus  face- 
centered  points  (same  as  the  factorial).  These  nine  control  variables  were  tested  at  each 
plus  and  minus  factorial  point  of  the  remaining  two  control  variables,  since  only  a  range 
of  [0,1]  was  utilized  on  these  two  controls.  This  CCD  allowed  an  appropriate  number  of 
samples  taken  from  the  large  design  space  of  the  control  variables.  A  sample  segment  of 
the  CCD  is  given  in  Appendix  B.  This  design  was  large,  but  since  the  AutoGAD 
algorithm  operated  relatively  quickly,  collecting  this  amount  of  data  was  not  too  costly  in 
terms  of  time  or  money.  Finally,  this  design  allowed  for  adequate  comparison  of 
quadratic  regression  versus  the  ANNs. 

For  this  research,  all  settings  were  in  terms  of  coded  values.  This  resulted  in 
every  control  or  noise  variable  setting  to  range  from  [-1,1].  To  code  natural  (uncoded) 
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values,  Equation  (4.2)  was  used  (Myers  &  Montgomery,  2002),  where  %  represented  the 


/th  natural  setting  of  the yth  variable  to  obtain  the  coded  (x)  setting.  Below,  max 


represented  the  maximum  value  for  the  yth  variable  across  all  rows  (/)  of  natural  settings: 

f  t  f-  \  .  •  (  r  W 


4  - 


max  {%.j )  +  min  {%'j ) 


xv=—f 


J 


max  {%.])-  min  ( <?.j ) 


(4.2) 


l  2  J 

Following  the  analysis  of  RPD,  the  optimal  settings  were  calculated  in  coded 
terms.  These  values  were  then  converted  back  to  their  natural  settings  by  applying 
Equation  (4.3): 


4  = 


X  * 

V 


max(^)-mm(£.) 


+ 


maxfc)+min  (%.j) 


(4.3) 


4.4.2  AutoGAD  Quadratic  Regression  RPD 

Once  the  CCD  was  set  up,  data  was  collected  in  AutoGAD  based  on  the  design 
points  for  all  four  response  values  for  each  run.  At  this  point,  AutoGAD  was  separated 
into  four  different  problems,  one  for  each  response.  Box-Cox  (Myers  &  Montgomery, 
2002)  analysis  was  applied  to  the  resultant  response  values  which  detennined  the  need 
for  any  transformations.  A  lambda  value,  which  represents  the  power  to  which  the 
response  data  is  raised  based  on  the  Box-Cox  transformation,  was  obtained  and  response 
values  were  re-calculated.  This  transformed  data  raised  the  R2  statistic  and  provided  a 
better  fit. 
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Quadratic  regression  was  applied  to  each  of  the  four  outputs  separately  and  an 
overall  process  model  for  each  response  was  calculated.  The  overall  process  model  was 
derived  by  calculating  (X'  X)  '(X'Y)  where  X  represented  the  design  with  a  column  of 
ones  added  to  the  beginning  and  Y  represented  the  response  examined.  This  process 
model  became  quite  large  containing  one  intercept  term,  1 1  control  main  effects,  1 1 
control  quadratic  effects,  55  control  by  control  interaction  effects,  three  noise  main 
effects,  and  33  control  by  noise  interaction  effects,  totaling  1 14  terms.  As  previously 
discussed,  no  noise  by  noise  interactions  were  considered,  based  on  suggestions  by  Myers 
&  Montgomery  (2002).  However,  including  the  noise  by  noise  interaction  would  have 
increased  the  R2  values  and  provided  better  fits. 

Many  of  the  1 14  terms  were  insignificant  according  to  the  Analysis  of  Variance 
(ANOVA)  table;  therefore,  a  backward  stepwise  regression  approach  was  employed  to 
reduce  the  model.  Once  the  full  model  was  obtained,  an  ANOVA  analysis  was 
performed  and  the  term  with  the  highest  p-value  (assuming  a  p-value  >0.10)  was 
removed  from  the  model.  The  process  model  was  then  recalculated  to  obtain  new 
coefficients  and  the  term  with  the  highest  p-value  was  removed.  This  process  was 
continued  until  a  reduced  model  was  developed  that  contained  only  significant  terms  (p- 
value  <=  0.10).  As  a  side  note,  the  control  main  effects  and  the  noise  main  effects  were 
never  removed  to  maintain  hierarchy  and  to  establish  mean  and  variance  models  for 
analysis.  If  a  main  effect  was  removed,  this  meant  that  the  settings  for  that  particular 
control  variable  would  have  no  effect  on  the  overall  response  or  variance  of  the  response. 
Due  to  the  sizes  of  the  reduced  models,  the  models  are  not  presented  in  this  document. 
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Once  the  reduced  model  was  obtained,  it  was  then  separated  into  its  appropriate 
mean  and  variance  models  as  shown  in  Equations  (2.7)  and  (2.8).  These  two  equations 
were  employed  as  dual  responses  in  an  effort  to  minimize  the  variance  and  satisfy 
constraints  on  the  mean.  Also,  it  is  important  to  note  that  this  research  assumed 
continuous  control  and  noise  variables.  This  dual  response  problem  was  solved  using  the 
LT  formulations  in  Equation  (4.4): 

MSEmin  =  {«=[. K*,-)]}2  +  6][y{x,z)\ 

MSEmax  =  -{*U [y(x,  z)]}2  +  cr2[y(x,  z)]  (4.4) 

MSEtar&et  =  {uz[y(x,z)]-T}2  +tf[y(x,z)\ 

Time  and  FPF  responses  were  minimized  while  TPF  and  TFP  were  maximized. 
All  instances  in  Equation  (4.4)  required  the  function  to  be  minimized.  To  search  for  the 
optimal  LT  value  and  its  settings,  a  complete  enumeration  of  integer  control  variables  and 
a  coarse  discretization  of  the  remaining  control  variables  was  performed.  This  resulted  in 
320,000,000  combinations  of  the  1 1  control  variables.  MATLAB®  calculated  LT  values 
of  this  enumeration  set  in  less  than  two  hours.  Other,  possibly  quicker,  methods  could  be 
employed;  however,  this  enumeration  technique  was  utilized  to  avoid  falling  into  local 
minimum  solutions. 

After  obtaining  the  LT  values  for  all  possible  combinations,  the  control  settings 
associated  with  the  minimum  LT  value  were  chosen  to  represent  the  robust  parameters. 
The  optimal  control  settings  for  the  four  different  outputs  tested  are  reported  in  Table  26, 
as  well  as  the  expected  mean,  variance,  and  LT  values.  These  control  settings  should 
prove  robust  to  new  images  introduced  into  AutoGAD. 
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Table  26.  Optimal  Settings  for  4  Responses  Suggested  by  QR 


Control 

Time 

TPF 

FPF 

TFP 

DA 

2 

-2 

-2 

-2 

MST 

6 

6 

6 

6 

BWSNR 

0.01 

0.01 

0.01 

0.08 

PTSNR 

1 

1 

1 

5 

BWI 

0.1 

0.01 

0.01 

0.1 

TBS 

0 

0 

0 

0 

CS 

1 

0 

0 

1 

SIH 

150 

50 

50 

150 

SIL 

45 

45 

5 

5 

LSNR 

14 

14 

4 

14 

WS 

3 

3 

5 

11 

Outputs 

Pred.  Mean 

3.05 

1.102 

0.13 

1.12 

Pred.  Var 

12.033 

0.189 

0.45 

0.398 

Pred.  LT 

21.34 

-1.026 

0.4669 

-0.8659 

As  seen  in  Table  26,  none  of  the  four  responses  shared  the  same  optimal  robust 
settings.  More  importantly,  the  four  responses  only  agreed  on  the  same  setting  for  one  of 
the  eleven  control  variables:  MST.  The  mean  results  for  TPF  and  TFP  indicated  values 
higher  than  one,  which  was  infeasible.  This  indicated  the  lack  of  fit  qualities  these 
quadratic  models  possessed.  To  test  the  validity  of  the  results  in  Table  26,  these  settings 
were  applied  to  the  eight  untested  images.  The  results  for  each  respective  response  were 
averaged  across  the  eight  images  to  establish  a  mean  and  variance.  Finally,  to  maintain 
consistency  with  the  selection  of  the  settings,  the  LT  values  were  calculated.  The  results 
when  the  suggested  optimal  settings  utilizing  quadratic  regression  were  tested  are 
presented  in  Table  27. 
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Table  27.  Testing  Results  for  Optimal  Settings  in  QR 


Mean 

Variance 

LT 

Time 

39.0214 

1590.1000 

3112.80 

TPF 

0.9752 

0.0004 

-0.9507 

FPF 

0.1180 

0.0030 

0.0170 

TFP 

0.8739 

0.0431 

-0.7207 

Prior  to  explanation  of  the  results  in  Table  27,  it  is  important  to  note  that  the  LT 
values  for  TPF  and  TFP  were  negative.  This  result  was  expected  due  to  the  construction 
of  a  maximized  response  LT  problem  in  Equation  (4.4).  The  designation  of  a  negative 
sign  on  the  mean-squared  term  of  the  LT  drove  the  LT  values  into  the  negative  response 
realm.  The  LT  value  was  still  minimized;  therefore,  the  settings  with  the  largest  negative 
value  were  optimal. 

The  results  given  in  Table  27  appear  to  be  appropriate.  TPF  and  TFP  obtained 
rather  large  mean  values  while  maintaining  low  variance.  The  mean  of  FPF  was  higher 
than  desired,  but  contained  a  very  small  variance  value.  Finally,  time  appeared  to  do  well 
in  terms  of  mean  but  calculated  an  extremely  large  variance  value. 

To  aid  in  understanding  the  reasons  time  and  FPF  did  not  achieve  great  results,  all 
four  ANOVA  tables  and  residual  plots  were  examined.  Table  28  -  Table  3 1  display  the 
ANOVA  tables  for  each  response.  As  seen  in  the  tables,  each  model  was  significant; 
however,  lack  of  fit  was  prevalent.  The  strong  significance  in  the  lack  of  fit  indicated  the 
quadratic  regression  model  did  not  accurately  fit  the  data,  which  led  to  false  conclusions. 

A  coefficient  of  determination,  or  R 2 ,  and  its  derivatives  were  utilized  to  assist  in 
the  explanation  of  ANOVA.  R2 ,  the  amount  of  variability  accounted  for  in  the  data,  was 
calculated  as  (Montgomery  et  al.,  2004): 
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where  SS£  was  the  sum  of  squares  of  residuals,  SST  was  the  total  sum  of  squares,  v, 
represented  the  true  response  values,  and  v,  the  fitted  response  values  for  n  treatment 
rows  (sample  size). 

The  R 2  statistics  increased  as  more  tenns  were  added  to  the  model.  Another 
statistic,  the  adjusted  R2 ,  only  increased  if  the  added  terms  to  the  model  reduced  the 
MSE  value.  This  statistic,  for  p  terms  (regressors),  was  calculated  as: 

adj(R>)= , 

V  ’  SST/(n- 1) 


Finally,  a  predicted  R2  statistic  was  utilized  which  gave  an  indication  on  how 
well  the  model  predicted  new  response  data.  This  is  similar  to  R2 ,  except  for  each 
residual  (/),  the  model  was  fit  to  the  remaining  n- 1  sample,  thus  taking  out  the  zth 
residual.  This  type  of  residual  analysis  is  known  as  the  PRESS  residual  and  is  denoted  as 
yU) .  This  predicted  R2  was  computed  as: 


pred  {R  )  =  I 


PRESS 

SST 


t(yi-yj 


Zkf-  Zk, 


/  n 


(4.7) 


i=\  V  i= 1  J 

These  three  R2  statistics  were  examined  in  the  ANOVA  tables  of  each 
response.  In  terms  of  time  (Table  28),  ani?2  value  of  0.37  was  obtained.  This  value  was 
extremely  low,  which  may  help  to  explain  the  poor  results  reported  in  Table  27.  The 
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other  three  responses  obtained  R 2  values  around  0.61  or  0.69.  These  values  indicated 
some  ability  quadratic  regression  had  in  fitting  the  given  data.  Their  predicted  R2  values 
fell  in  the  same  general  range,  thus  allowing  for  decent  prediction  of  optimal  control 
settings  given  new  data.  However,  great  results  were  not  expected  due  to  the  lack  of  fit 
significance. 


Table  28.  ANOVA  for  Time 


Sum  of 

Mean 

F 

p-value 

Source 

Squares 

df 

Square 

Value 

Prob  >  F 

Model 

340.70 

36 

9.46 

285.5 

<  0.0001 

significant 

Residual 

571.70 

17246 

0.03 

Lack  of  Fit 

570.01 

16950 

0.03 

5.873 

<  0.0001 

significant 

Pure  Error 

1.69 

296 

0.01 

Cor  Total 

912.41 

17282 

R-Squared  0.3734 

Adj  R-Squared  0.3721 

Pred  R-Squared  0.3708 


Table  29.  ANOVA  for  TPF 


Sum  of 

Mean 

F 

p-value 

Source 

Squares 

df 

Square 

Value 

Prob  >  F 

Model 

1435.25 

60 

23.92 

451.40 

<  0.0001 

significant 

Residual 

912.90 

17227 

0.05 

Lack  of  Fit 

911.84 

16931 

0.05 

14.95 

<  0.0001 

significant 

Pure  Error 

1.07 

296 

0.00 

Cor  Total 

2348.15 

17287 

R-Squared  0.6112 

Adj  R-Squared  0.6099 

Pred  R-Squared  0.6086 
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Table  30.  ANOVA  for  FPF 


Sum  of 

Mean 

F 

p-value 

Source 

Squares 

df 

Square 

Value 

Prob  >  F 

Model 

9105.15 

72 

126.46 

537.95 

<  0.0001 

significant 

Residual 

4046.90 

17215 

0.24 

Lack  of  Fit 

4046.36 

16919 

0.24 

130.79 

<  0.0001 

significant 

Pure  Error 

0.54 

296 

0.00 

Cor  Total 

13152.05 

17287 

R-Squared  0.6923 

Adj  R-Squared  0.6910 

Pred  R-Squared  0.6899 


Table  31.  ANOVA  for  TFP 


Sum  of 

Mean 

F 

p-value 

Source 

Squares 

df 

Square 

Value 

Prob  >  F 

Model 

1840.41 

62 

29.68 

438.30 

<  0.0001 

significant 

Residual 

1166.56 

17225 

0.07 

Lack  of  Fit 

1166.07 

16929 

0.07 

41.17 

<  0.0001 

significant 

Pure  Error 

0.50 

296 

0.00 

Cor  Total 

3006.97 

17287 

R-Squared  0.6120 

Adj  R-Squared  0.6107 

Pred  R-Squared  0.6095 


Figure  30  displays  the  normal  probability  plots  for  each  response:  Time  (top-left), 


TPF  (top-right),  FPF  (bottom-left),  and  TFP  (bottom-right).  As  seen  in  the  first  plot 


(time),  there  was  a  heavy-tailed  distribution.  This  phenomenon  caused  the  low  R 2  value 


and  significant  lack  of  fit.  Quadratic  regression  was  not  suggested  for  this  response.  The 


nonnal  probability  plots  for  the  other  three  responses  all  indicated  a  light-tailed 


distribution,  which  indicated  a  slight  diversion  from  the  nonnality  assumption  crucial  to 


ANOVA.  However,  the  residuals  fell  near  the  line  and  decent  R2  values  were 


maintained. 
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Normal  %  Proba  Normal  %  Probs 


Figure  30.  Residual  Plots  for  Time,  TPF,  FPF,  TFP 
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4.4.3  AutoGAD  ANNRPD 


Traditional  RPD  methodology  was  demonstrated  on  single  response  problems 
using  quadratic  regression,  as  shown  in  Section  4.4.2.  As  discussed  in  the  previous 
section,  the  R 2  values  were  low  and  the  lack  of  fit  was  significant  for  all  four  responses. 
Therefore,  quadratic  regression  failed  to  properly  fit  the  data,  which  produced  possible 
non-optimal  robust  settings  as  a  result.  Artificial  neural  networks  (ANNs)  were  instituted 
at  this  juncture  to  provide  better  fits  to  these  seemingly  nonlinear  problems. 

In  order  to  construct  the  ANN  to  fit  the  data,  the  ANN  needed  to  be  trained 
properly.  To  train  the  neural  network,  a  hold-out  method  was  utilized.  This  method 
withheld  one-third  of  the  data  for  testing  of  the  neural  network  and  employed  two-thirds 
for  training  (Kuncheva,  2004).  Withholding  data  from  training  allowed  for  the  spread 
parameter  of  the  network  to  be  adjusted  appropriately  to  its  optimal  setting.  The  withheld 
data  was  tested  at  each  spread  increment  and  the  root  mean  square  error  (RMSE)  was 
calculated  on  the  test  set.  The  spread  value  with  the  lowest  RMSE  was  chosen  and  the 
network  was  considered  trained.  An  example  spread  versus  RMSE  plot  is  given  in 
Figure  3 1 .  Since  two  outputs  (mean  and  variance)  were  calculated,  a  spread  value  and 
RMSE  was  attributed  to  each  output.  Typically,  the  optimal  spread  was  the  same  for 
each  output;  however,  if  they  differed,  a  tradeoff  assessment  was  necessary.  The  plot  in 
Figure  3 1  displays  the  spread  versus  RMSE  for  mean  and  variance  for  the  output  TPF. 
The  optimal  spread  value  was  0.85  for  this  response. 
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Figure  31.  Spread  vs.  RMSE  for  TPF 

The  hold-out  method  employed  caused  the  ANNs  to  be  at  a  disadvantage  with 
respect  to  quadratic  regression.  When  training  the  network,  ANNs  only  utilized  two- 
thirds  of  the  data;  whereas,  quadratic  regression  utilized  the  entire  data  set  for  training. 
Although  at  a  disadvantage,  better  results  were  obtained  using  ANNs. 

Since  a  crossed  array  design  was  implemented  on  AutoGAD,  the  “pre¬ 
processing”  ANN  approach  was  employed  due  to  its  ability  to  fit  a  network  in  less  time 
than  the  “post-processing”  approach  in  the  absence  of  noise  variables  as  inputs.  This 
reduced  the  number  of  inputs  from  14  to  11,  which  decreased  the  network  calculation 
time.  As  depicted  previously  in  Figure  16,  only  the  control  variables  were  inputted  and 
mean  and  variance  values  were  outputted  for  every  control  variable  combination.  For 
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each  mean  and  variance  value  with  respect  to  each  unique  combination  of  control 
variable  settings,  an  LT  statistic  was  calculated. 

In  Section  4.4.2,  a  summary  of  the  enumerated  search  performed  on  the  setting 
space  to  detennine  optimal  LT  solutions  was  provided.  The  same  principle  was  applied 
at  this  point,  but  due  to  the  construction  of  the  MATLAB®  code  for  neural  networks,  a 
coarser  and  reduced  discretized  data  set  was  utilized.  The  testing  set  was  reduced  to 
4,000,000  possible  combinations  as  a  result;  however,  an  appropriate  number  of  settings 
was  tested  for  each  control  variable. 

Implementing  the  smaller  exhaustive  search  placed  the  ANN  at  another 
disadvantage  compared  to  quadratic  regression.  First,  the  neural  network  used  less  data 
(only  two-thirds)  to  train  and  fit  appropriate  mean  and  variance  models.  Now,  fewer 
possible  combinations  were  searched  in  the  LT  space  due  to  time  restrictions.  However, 
the  example  problem  tested  in  Section  3.4.2  employed  the  same  hold-out  principle  and 
reduced  enumerated  search,  yet  yielded  superior  results  to  quadratic  regression. 

The  four  response  problems  were  examined  separately  and  optimal  robust  settings 
were  obtained  for  each  response.  The  results  obtained  using  RBFNNs  are  reported  in 
Table  32  and  the  optimal  robust  settings  obtained  using  GRNNs  are  reported  in  Table  33. 
Along  with  the  settings,  the  predicted  mean,  variance,  and  LT  values  are  displayed  from 
training  the  ANNs. 
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Table  32.  Optimal  Settings  for  Four  Responses  (RBFNN) 


Control 

Time 

TPF 

FPF 

TFP 

DA 

-2 

0 

-2 

-2 

MST 

14 

6 

6 

6 

BWSNR 

0.1 

0.07 

0.1 

0.1 

PTSNR 

6 

3.5 

6 

6 

BWI 

0.01 

0.04 

0.1 

0.1 

TBS 

0 

1 

0 

0 

CS 

1 

1 

1 

1 

SIH 

50 

83 

150 

150 

SIL 

45 

25 

45 

45 

LSNR 

4 

11 

14 

14 

WS 

3 

3 

11 

11 

Outputs 

Pred.  Mean 

6.9526 

0.9933 

0.0003 

0.9058 

Pred.  Var 

82.438 

0.0002 

6.278E-07 

0.0529 

Pred.  LT 

130.77 

-0.9865 

7.238E-07 

-0.7676 

Table  33.  Optimal  Settings  for  Four  Responses  (GRNN) 


Control 

Time 

TPF 

FPF 

TFP 

DA 

-2 

2 

-2 

-2 

MST 

14 

6 

14 

6 

BWSNR 

0.1 

0.01 

0.1 

0.1 

PTSNR 

1 

6 

1 

6 

BWI 

0.01 

0.01 

0.1 

0.1 

TBS 

1 

1 

0 

1 

CS 

0 

0 

1 

1 

SIH 

150 

50 

50 

150 

SIL 

45 

5 

45 

45 

LSNR 

4 

4 

14 

14 

WS 

11 

11 

11 

11 

Outputs 

Pred.  Mean 

6.61 

0.9864 

0.0007 

0.9154 

Pred.  Var 

83.625 

0.001 

3.68E-06 

0.0479 

Pred.  LT 

127.31 

-0.972 

4.212E-06 

-0.7901 

To  determine  the  validity  of  using  ANNs  over  quadratic  regression,  the  settings  in 
Table  32  and  Table  33  were  tested  in  AutoGAD  on  the  8  un-trained  images.  Mean  and 
variance  were  calculated  for  each  combination  of  settings  across  all  images  tested.  From 
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the  mean  and  variance,  the  LT  statistic  was  calculated.  The  LT  statistic  was  employed  to 


maintain  consistency  since  it  was  used  to  detennine  the  optimal  robust  settings.  The 


results  for  RBFNNs  are  reported  in  Table  34  and  GRNN  results  are  displayed  in  Table  35 


for  the  four  responses. 


Table  34.  Testing  Results  for  Optimal  Settings  using  RBFNN 


Mean 

Variance 

LT 

Time 

29.4 

742.9 

1606.5 

TPF 

0.9644 

0.0019 

-0.9282 

FPF 

0.0005 

0.0000 

0.0000 

TFP 

0.9158 

0.0226 

-0.8161 

Table  35.  Testing  Results  for  Optimal  Settings  using  GRNN 


Mean 

Variance 

LT 

Time 

24.1 

662.7 

1245.7 

TPF 

0.9806 

0.0010 

-0.9606 

FPF 

0.0002 

0.0000 

0.0000 

TFP 

0.9074 

0.0232 

-0.8002 

As  seen  in  Table  34  and  Table  35,  appropriate  results  were  obtained.  To  measure 
the  “goodness”  of  these  results,  they  were  compared  to  those  obtained  using  quadratic 
regression.  A  comparison  of  the  settings  for  quadratic  regression  and  the  ANNs  is  shown 
in  Table  36.  The  results  obtained  when  testing  the  robust  settings  when  quadratic 
regression  and  ANNs  were  used  are  displayed  in  Table  37  -  Table  40. 


Table  36.  Optimal  Settings  for  QR  and  ANNs 


|  Time  | 

1  TPF  | 

i  FPF  1 

1 _ IfP _ 1 

Control 

QR 

RBFNN 

GRNN 

QR 

RBFNN 

GRNN 

QR 

RBFNN 

GRNN 

QR 

RBFNN 

GRNN 

DA 

2 

-2 

-2 

-2 

0 

2 

2 

-2 

-2 

-2 

-2 

-2 

MST 

6 

14 

14 

6 

6 

6 

6 

6 

14 

6 

6 

6 

BWSNR 

0.01 

0.1 

0.1 

0.01 

0.07 

0.01 

0.01 

0.1 

0.1 

0.08 

0.1 

0.1 

PTSNR 

1 

6 

1 

1 

3.5 

6 

1 

6 

1 

5 

6 

6 

BWI 

0.1 

0.01 

0.01 

0.01 

0.04 

0.01 

0.01 

0.1 

0.1 

0.1 

0.1 

0.1 

TBS 

0 

0 

1 

0 

1 

1 

1 

0 

0 

0 

0 

1 

cs 

1 

1 

0 

0 

1 

0 

0 

1 

1 

1 

1 

1 

SIH 

150 

50 

150 

50 

83 

50 

50 

150 

50 

150 

150 

150 

SIL 

45 

45 

45 

45 

25 

5 

45 

45 

45 

5 

45 

45 

LSNR 

14 

4 

4 

14 

11 

4 

14 

14 

14 

14 

14 

14 

WS 

3 

3 

11 

3 

3 

11 

3 

11 

11 

11 

11 

11 
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Table  37.  Testing  Results  for  QR  and  ANNs  on  Time 

I  — ~ ~ "=  Time  I 


Mean 

Variance 

LT 

QR 

39.02 

1590.1 

3112.8 

RBFNN 

29.39 

742.9 

1606.5 

GRNN 

24.15 

662.7 

1245.7 

Table  38.  Testing  Results  for  QR  and  ANNs  on  TPF 


TPF 


Mean 

Variance 

LT 

QR 

0.9752 

0.0004 

-0.9507 

RBFNN 

0.9644 

0.0019 

-0.9282 

GRNN 

0.9806 

0.0010 

-0.9606 

Table  39.  Testing  Results  for  QR  and  ANNs  on  FPF 


FPF 


Mean 

Variance 

LT 

QR 

0.1991 

0.0070 

0.0466 

RBFNN 

0.0005 

0.0000 

0.0000 

GRNN 

0.0002 

0.0000 

0.0000 

Table  40.  Testing  Results  for  QR  and  ANNs  on  TFP 


TFP 


Mean 

Variance 

LT 

QR 

0.8739 

0.0431 

-0.7207 

RBFNN 

0.9158 

0.0226 

-0.8161 

GRNN 

0.9074 

0.0232 

-0.8002 

As  seen  in  Table  36,  extremely  different  settings  were  obtained  using  quadratic 
regression  and  ANNs,  with  the  possible  exception  of  TFP.  This  indicated  that  the  two 
methods  of  modeling  the  mean  and  variance  were  quite  different. 

In  terms  of  time  (Table  37),  ANNs  outperformed  quadratic  regression.  The 
ANNs  reduced  the  average  time  by  nearly  15  seconds  and  cut  the  variance  value  in  half. 
GRNNs  obtained  a  better  result  than  the  RBFNNs  by  obtaining  a  22.5  percent  reduction 
in  LT  value  with  respect  to  the  LT  obtained  in  quadratic  regression.  These  results 
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suggested  that  ANNs  were  clearly  preferred  to  quadratic  regression,  especially  since  an 
R 2  value  of  0.37  was  obtained. 

TPF  displayed  the  only  possible  scenario  where  quadratic  regression  was 
competitive.  As  seen  in  Table  38,  quadratic  regression  obtained  a  lower  (higher 
negative)  LT  value  than  RBFNNs,  but  GRNNs  were  still  preferred  by  obtaining  the 
lowest  LT  value.  Although  RBFNNs  performed  the  worst  in  this  scenario,  their  results 
remained  competitive  since  they  achieved  similar  means  with  slightly  larger  variances. 

FPF  conveyed  strong  results  for  ANNs  as  shown  in  Table  39.  An  LT  value  of 
almost  zero  was  obtained  when  either  ANN  was  used,  which  provided  a  nearly  perfect 
LT  value.  Quadratic  regression  maintained  low  variance,  but  the  mean  value  ballooned 
to  0. 1 1,  which  indicated  the  presence  of  multiple  false  positive  identifications. 

Finally,  the  results  shown  in  Table  40  represent  superior  results,  in  general,  for 
ANNs  as  compared  to  quadratic  regression  for  TFP.  Compared  to  quadratic  regression, 
the  RBFNNs  and  GRNNs  saw  an  increased  mean  of  4.8  and  3.8  percent,  respectively, 
while  the  variance  was  reduced.  This  resulted  in  lower  LT  values  for  each  ANN. 

The  results  presented  in  Table  37  -  Table  40  provided  strong  evidence  in  the 
usefulness  of  ANNs  when  quadratic  regression  fails  to  properly  fit  the  data.  The  extreme 
case  was  seen  with  time,  since  anR2  value  of  0.37  was  obtained.  However,  the  other 
three  responses  achieved  an  R2  value  in  the  range  of  0.60-0.70,  yet  ANNs  still  performed 
better  or  the  same  as  quadratic  regression.  When  dealing  with  problems  containing 
highly  nonlinear  responses,  as  suggested  in  the  AutoGAD  problem,  ANNs  should  be 
considered  as  an  alternative  to  traditional  quadratic  regression. 
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4.5  AutoGAD  and  System  Degradation 

Optimal  robust  settings  for  the  four  responses  of  AutoGAD  using  quadratic 
regression  and  ANNs  were  presented  in  Sections  4.4.2  and  4.4.3.  These  settings  were 
expected  to  be  robust  under  normal  operating  conditions  of  the  algorithm.  However,  new 
robust  settings  were  necessary  to  guard  against  system  degradation.  In  this  section,  the 
performances  of  doubly  robust  settings  are  examined. 

As  explained  in  Section  3.3,  system  degradation  in  software  can  be  represented  by 
being  exposed  to  inputs  beyond  its  experience  and  training.  In  AutoGAD,  this  equated  to 
a  new  image  considered  to  be  noisier  than  any  image  on  which  training  was  performed. 

A  signal-to-noise  ratio  (SNR)  was  calculated  for  all  eight  available  images.  The  image 
with  the  largest  SNR  value  was  selected  as  the  “noisiest”  image.  This  corresponded  to 
Image  6  of  the  available  images.  If  this  image  was  withheld  from  AutoGAD  training,  it 
was  expected  that  AutoGAD  would  perform  relatively  poorly  with  respect  to  this  image. 
That  is  to  say,  degraded  performance  was  expected.  It  is  important  to  note  that  although 
this  image  was  the  “noisiest”  it  was  not  necessarily  much  noisier  than  the  other  images. 

In  other  words,  it  was  not  truly  an  outlier  compared  to  the  other  seven  images.  As  a  side 
note,  all  analysis  performed  in  the  previous  sections  contained  both  halves  of  Image  6  in 
the  testing  set;  therefore,  the  same  training  data  was  utilized  in  this  section. 

For  simplicity,  only  two  responses  were  utilized  to  test  AutoGAD  under  system 
degradation  using  both  quadratic  regression  and  ANNs.  TPF  and  TFP  were  chosen  as  the 
responses,  since  both  quadratic  regression  and  ANNs  performed  well  in  determining  their 
appropriate  optimal  settings.  Time  was  not  considered  due  to  the  substantial  lack  of  fit 
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that  quadratic  regression  displayed  for  this  response.  Also,  FPF  was  not  chosen  since 
values  of  zero  were  obtained  for  nearly  every  image. 

4.5.1  Doubly  Robust  Settings  in  Quadratic  Regression 

Quadratic  regression  was  presented  in  Section  4.4.2  in  which  optimal  robust 
settings  for  TPF  and  TFP,  as  given  in  Table  26,  had  associated  LT  values  of  7*  =  -0.9507 
and  Y*  =  -0.7207 ,  respectively.  Recall,  these  values  were  negative  due  to  both  responses 
being  maximized.  To  determine  the  doubly  robust  settings,  the  algorithm  outlined  in 
Figure  10  was  employed. 

The  doubly  robust  algorithm  was  conducted  until  7**  =  -0.7605  and 
7”  =  -0.5766  was  realized,  which  was  a  20  percent  increase  in  LT.  x*was  equal  to  the 
settings  given  in  Table  26  corresponding  to  TPF  and  TFP.  The  coefficients  ( Cokl  j  to 

begin  the  gradient  search  corresponded  to  the  coefficients  used  to  construct  the  LT 
statistic  to  solve  for  x* .  Once  the  coefficients  were  obtained,  the  derivative  of  LT  (7) 

dY 

was  taken  with  respect  to  these  coefficients  (C),  V7  =  —  .  A  step  was  then  taken  in 

DC  gold 

the  gradient  direction  and  the  new  dual  response  problem  was  solved  to  find  the 
minimum  LT  statistic  and  its  corresponding  control  variable  settings.  This  process  was 
repeated  until  7**  was  realized.  The  optimal  control  variable  settings  associated  with 
7”  were  considered  to  be  doubly  robust  settings.  These  settings  should  prove  robust  to 
noise  variables  as  well  as  robust  to  system  degradation. 

The  doubly  robust  settings  and  original  optimal  robust  settings  were  tested  against 
Image  6  which  represented  system  degradation  in  AutoGAD.  Eight  replications  of  the 
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two  combinations  of  settings  were  tested  against  both  halves  of  Image  6  to  obtain  a  mean, 
variance,  and  LT  value.  The  original  robust  settings  as  well  as  the  doubly  robust  settings 
for  output  TPF  are  displayed  in  Table  41.  The  results  of  these  settings  in  TPF  against 
Image  6  are  reported  in  Table  42. 

Table  41.  Original  and  Doubly  Robust  Settings  for  TPF 


Control 

Original 

TPF 

Doubly  Robust 

DA 

-2 

-2 

MST 

6 

7 

BWSNR 

0.01 

0.09 

PTSNR 

1 

3 

BWI 

0.01 

0.01 

TBS 

0 

1 

CS 

0 

0 

SIH 

50 

150 

SIL 

45 

45 

LSNR 

14 

8 

WS 

3 

3 

Table  42.  Image  6  Results  for  TPF  Settings 


TPF  -  Quadratic  Regression 

Mean 

Variance 

LT 

Original  0.9627 

0.0005 

-0.9263 

Doubly  Robust  0.9634 

0.0004 

-0.9278 

The  results  shown  in  Table  42  validated  the  use  of  doubly  robust  settings  in  the 
presence  of  system  degradation.  By  moving  to  a  20  percent  increase  in  LT,  the  doubly 
robust  settings  guarded  against  images  that  caused  degradation.  The  doubly  robust 
settings  obtained  a  larger  mean  (desired)  and  smaller  variance  than  the  original  robust 
settings  for  Image  6.  Recall,  Image  6  was  not  necessarily  a  very  noisy  image,  which 
indicates  why  the  results  did  not  differ  as  much  as  possibly  expected.  This  however 
changed  when  ANNs  were  examined. 
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The  same  analysis  was  perfonned  on  TFP,  with  a  slight  change  in  the  gradient 
step.  Originally,  a  20  percent  increase  in  LT  was  utilized,  but  this  step  size  was  deemed 
too  large  of  an  increase  for  this  problem.  A  substantial  increase  (not  desired)  in  LT  value 
was  obtained  with  the  doubly  robust  settings.  This  step  size  will  vary  from  problem  to 
problem.  Therefore,  instead  of  a  20  percent  increase  in  LT,  a  five  percent  increase  in  LT 
was  examined.  The  settings  for  both  the  original  and  doubly  robust  with  respect  to  TFP 
are  displayed  in  Table  43  and  the  results  obtained  when  these  settings  were  applied  to 
Image  6  are  reported  in  Table  44.  Recall  that  eight  replications  were  performed  on  Image 
6  to  calculate  a  mean,  variance,  and  LT  statistic  for  each  combination  of  settings. 

Table  43.  Original  and  Doubly  Robust  Settings  for  TFP 


Control 

Original 

TFP 

Doubly  Robust 

DA 

-2 

-2 

MST 

14 

14 

BWSNR 

0.1 

0.1 

PTSNR 

6 

6 

BWI 

0.08 

0.09 

TBS 

0 

0 

CS 

1 

1 

SIH 

150 

150 

SIL 

5 

5 

LSNR 

14 

14 

WS 

3 

11 

Table  44.  Image  6  Results  for  TFP  Settings 


TFP  -  Quadratic  Regression 

Mean 

Variance 

LT 

Original 

0.9206 

0.0110 

-0.8365 

Doubly  Robust  (5%) 

0.9934 

0.0004 

-0.9864 

Once  again,  the  results  portrayed  that  doubly  robust  settings  were  superior  to 


original  robust  settings  when  faced  with  system  degradation.  In  fact,  applying  doubly 
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robust  settings  to  Image  6  in  terms  of  TFP  yielded  almost  perfect  results  and  increased 
the  LT  value  significantly.  These  results  demonstrated  the  utility  of  doubly  robust 
settings  over  typical  robust  settings  to  guard  against  system  degradation.  The  same 
analysis  using  ANNs  rather  than  quadratic  regression  was  applied  to  system  degradation, 
as  explained  in  the  next  section. 

4.5.2  Doubly  Robust  Settings  in  ANNs 

ANNs  were  shown  to  be  the  preferred  choice  over  quadratic  regression  in  Section 
4.4.3.  In  what  follows,  doubly  robust  settings  were  also  calculated  using  the  gradient 
method  outlined  in  Figure  23.  The  same  responses,  TPF  and  TFP,  were  examined  to 
determine  doubly  robust  settings  for  RBFNNs  and  GRNNs.  Also,  the  same  20  percent 
increase  in  LT  for  TPF  and  five  percent  LT  increase  for  TFP  were  maintained. 

Gradient  analysis  was  applied  to  RBFNNs  on  the  response  TPF.  The  original 
robust  settings  and  the  doubly  robust  settings  are  displayed  in  Table  45  for  comparison. 
The  results  obtained  when  these  two  settings  were  applied  to  both  halves  of  Image  6  over 
eight  replications  are  reported  in  Table  46.  As  seen  in  this  table,  the  doubly  robust 
settings  achieved  a  better  LT  value.  Approximately  a  one  percent  increase  was  seen  in 
the  mean  while  a  slight  reduction  of  variance  was  observed,  which  led  to  the  lower  LT 
statistic  collected  with  the  doubly  robust  settings.  This  further  warranted  the  use  of 
doubly  robust  settings  in  the  presence  of  possible  system  degradation. 
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Table  45.  RBFNN  Original  and  Doubly  Robust  Settings  for  TPF 


Control 

Original 

TPF 

Doubly  Robust 

DA 

0 

0 

MST 

6 

6 

BWSNR 

0.07 

0.07 

PTSNR 

3.5 

3.5 

BWI 

0.07 

0.04 

TBS 

0 

0 

CS 

1 

1 

SIH 

83 

83 

SIL 

25 

25 

LSNR 

11 

11 

WS 

3 

3 

Table  46.  RBFNN  Image  6  Results  on  TPF 


TPF -RBFNN 

Mean 

Variance 

LT 

Original 

0.9649 

0.0005 

-0.9304 

Doubly  Robust 

0.9774 

0.0004 

-0.9549 

Gradient  analysis  was  then  applied  to  GRNNs  on  TPF.  The  settings  for  the 
original  robust  settings  and  doubly  robust  settings  types  are  displayed  in  Table  47  and  the 
results  when  applied  to  Image  6  are  reported  in  Table  48.  This  was  one  of  the  only 
instances  in  which  the  doubly  robust  settings  failed  to  outperform  the  original  settings. 

As  depicted  in  Table  48,  the  original  robust  settings  were  nearly  perfect  in  detennining 
all  targets  within  the  image.  This  situation  made  it  difficult  for  any  possible  new  settings 
to  outperform  the  original  settings.  Although  the  doubly  robust  settings  failed  to 
outperform  the  original  robust  settings,  their  difference  was  minimal.  Less  than  a  one 
percent  difference  was  observed  between  the  means  of  each  solution  and  the  LT  statistic 
of  each  solution.  Therefore,  either  combination  of  these  settings  were  deemed 
appropriate  for  this  new  image. 
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Table  47.  GRNN  Original  and  Doubly  Robust  Settings  on  TPF 


Control 

Original 

TPF 

Doubly  Robust 

DA 

2 

-2 

MST 

6 

6 

BWSNR 

0.1 

0.1 

PTSNR 

6 

1 

BWI 

0.01 

0.01 

TBS 

0 

1 

CS 

1 

1 

SIH 

150 

50 

SIL 

5 

5 

LSNR 

14 

14 

WS 

3 

3 

Table  48.  GRNN  Image  6  Results  on  TPF 


TPF -GRNN 

Mean 

Variance 

LT 

Original 

0.9941 

0.0000 

-0.9881 

Doubly  Robust 

0.9891 

0.0000 

-0.9782 

As  in  quadratic  regression,  only  TPF  and  TFP  were  examined.  Time  resulted  in 
large  variance  values  and  a  significant  lack  of  fit  with  quadratic  regression.  FPF 
consistently  maintained  a  mean  value  near  zero  for  all  images,  making  degradation 
difficult  to  capture.  Therefore,  to  remain  consistent  with  quadratic  regression,  the 
response  TFP  was  the  only  other  response  examined.  First,  RBFNN  gradient  analysis 
was  applied  to  this  response.  The  original  robust  settings  and  doubly  robust  settings  for 
TFP  are  presented  in  Table  49.  The  results  when  these  settings  were  tested  against  Image 
6,  with  eight  replications  perfonned,  are  given  in  Table  50.  As  seen  in  Table  50,  a 
significant  increase  in  performance  was  obtained  using  doubly  robust  settings  as 
compared  to  the  original  robust  settings.  The  mean  value  increased  nearly  two  percent 
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and  the  LT  statistic  raised  about  four  percent.  Also,  the  variance  reduced  by  two-thirds 
through  the  use  of  doubly  robust  settings. 

Table  49.  RBFNN  Original  and  Doubly  Robust  Settings  on  TFP 


Control 

Original 

TFP 

Doubly  Robust 

DA 

-2 

-2 

MST 

6 

6 

BWSNR 

0.1 

0.1 

PTSNR 

6 

6 

BWI 

0.1 

0.1 

TBS 

0 

1 

CS 

1 

1 

SIH 

150 

150 

SIL 

45 

45 

LSNR 

14 

14 

WS 

11 

11 

Table  50.  RBFNN  Image  6  Results  on  TFP 


TFP -RBFNN 

Mean 

Variance 

LT 

Original 

0.9641 

0.0012 

-0.9283 

Doubly  Robust 

0.9879 

0.0004 

-0.9756 

GRNN  gradient  analysis  was  also  applied  to  TFP.  The  two  combinations  of 
settings  are  displayed  in  Table  5 1  and  the  results  when  replicated  on  Image  6  are 
presented  in  Table  52.  As  seen  with  GRNNs  in  TPF,  doubly  robust  settings  were  unable 
to  perfonn  better  than  the  original  robust  settings.  The  original  robust  settings  already 
performed  near  perfect  which  made  the  probability  of  a  different  combination  of  settings 
to  outperform  the  original  settings  nearly  impossible.  However,  the  doubly  robust 
settings  remained  competitive. 
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Table  51.  GRNN  Original  and  Doubly  Robust  Settings  on  TFP 


Control 

Original 

TFP 

Doubly  Robust 

DA 

2 

-2 

MST 

6 

6 

BWSNR 

0.1 

0.1 

PTSNR 

6 

1 

BWI 

0.1 

0.1 

TBS 

0 

0 

CS 

1 

1 

SIH 

150 

150 

SIL 

45 

5 

LSNR 

14 

14 

WS 

11 

11 

Table  52.  GRNN  Image  6  Results  on  TFP 


TFP  -  GRNN 

Mean 

Variance 

LT 

Original 

0.9716 

0.0009 

-0.9431 

Doubly  Robust 

0.9613 

0.0007 

-0.9235 

The  results  reported  in  this  section,  as  well  as  Section  4.5.1,  validated  the  use  of 
doubly  robust  settings  to  guard  against  system  degradation.  The  image  utilized  in 
AutoGAD  to  demonstrate  system  degradation  was  not  the  optimal  choice  of  the  author  in 
terms  of  “noisiness,”  but  proved  useful  in  validating  the  proposed  technique  of 
determining  new  doubly  robust  settings  in  RPD.  If  new  images  were  produced  in  the 
future  with  “noisier”  situations,  better  results  would  be  expected  by  using  these  doubly 
robust  settings,  especially  with  GRNNs. 


4. 6  AutoGAD  and  Factor  Analysis 

In  Sections  4.4  and  4.5,  methodology  was  presented  on  AutoGAD  with  each 
response  as  its  own  problem.  This  resulted  in  different  optimal  settings  for  each 
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response.  However,  the  user  was  interested  in  optimizing  all  four  responses  with  one 
combination  of  settings.  Ideally,  this  meant  detecting  all  targets  with  no  false  positives  in 
as  little  time  as  possible.  Possessing  a  fast  algorithm  with  poor  detection  abilities  or 
running  a  very  good  detection  algorithm  at  the  cost  of  large  amounts  of  time  was 
undesirable.  Therefore,  it  was  necessary  to  detennine  settings  in  which  solving  for 
simultaneous  responses  yields  results  that  were  appropriate  for  all  four  responses  given 
any  image. 

To  circumvent  the  need  for  subject  matter  experts  or  simple  adding/subtracting  of 
response  values,  factor  analysis  was  employed  to  reduce  the  four  responses  into  a  single 
dimensional  response.  Eight  different  linear  combination  methodologies  were  presented 
in  Table  17  in  Section  3.5.1  which  reduced  the  factor  scores  into  a  single  dimension  if 
multiple  factors  were  retained. 

Prior  to  conducting  factor  analysis,  two  of  the  responses  were  transformed  to 
allow  for  simpler  analysis.  First,  an  inverse  transformation  was  applied  to  time  in  order 
to  make  it  a  maximized  response.  Second,  a  (1-FPF)  was  applied  to  FPF  which  also 
made  it  a  maximized  response.  Therefore,  all  four  responses  were  maximized,  thus 
making  analysis  of  the  factor  scores  easier  to  understand. 

To  begin  factor  analysis,  eigenvalues  were  calculated  which  detennined  the 
amount  of  variation  explained  by  different  factors.  Determining  these  eigenvalues 
through  principal  component  analysis  allowed  for  a  designation  of  the  number  of  factors 
which  were  retained.  Three  popular  methods  were  considered  when  factors  were 
retained:  Kaiser  Criterion,  scree  plots,  and  variance  explained  by  factors  (Dillon  & 
Goldstein,  1984). 
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The  Kaiser  Criterion  eliminated  all  factors  that  had  an  associated  eigenvalue 
below  one.  This  method  can  underestimate  the  number  of  factors  necessary  if 
eigenvalues  of  other  factors  score  near  one.  Scree  plots  counteracted  this  phenomenon  by 
examining  a  plot  of  eigenvalues  versus  factors.  A  cutoff  point  was  selected  when  a 
noticeable  drop  in  eigenvalue  occurred,  which  retained  some  factors  that  scored  close  to 
one.  Finally,  the  number  of  factors  retained  was  related  to  the  minimum  amount  of 
variance  the  user  deemed  necessary  to  explain.  For  instance,  if  80  percent  of  the  variance 
was  necessary,  factors  were  added  until  this  80  percent  explanation  of  variance  was 
achieved.  These  approaches  were  applied  in  this  research  to  choose  the  appropriate 
number  of  factors  and  their  overall  results. 

The  eigenvalues  for  response  data  in  AutoGAD  is  displayed  in  Table  53.  Also 
displayed  is  the  amount  of  variance  explained  by  each  factor  (according  to  its 
eigenvalue).  According  to  Kaiser’s  Criterion,  only  two  factors  were  retained.  This 
amounted  to  approximately  66  percent  of  the  variance  explained. 


Table  53.  Eigenvalues  of  Factors  for  AutoGAD 


Factor  1 

Factor  2 

Factor  3 

Factor  4 

Eigenvalue 

1.5006 

1.1612 

0.9069 

0.4314 

Var.  Explained 

0.3752 

0.2903 

0.2267 

0.1078 

As  seen  in  Table  53,  two  eigenvalues  were  greater  than  one.  However,  the 
eigenvalue  related  to  Factor  3  was  0.9069,  which  was  very  close  to  one.  Retaining  this 
factor  allowed  for  nearly  90  percent  of  the  variance  to  be  explained,  which  was  suggested 
by  a  scree  plot  since  a  noticeable  drop  occurred  between  Factor  3  and  Factor  4.  The  scree 
plot  also  coincided  with  the  third  method  in  which  an  appropriate  amount  of  variance  was 
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explained.  Therefore,  Kaiser’s  Criterion  retained  two  factors  and  scree  plots  retained 
three  factors,  thus  both  set  of  factors  were  examined. 

Following  the  selection  of  the  appropriate  number  of  factors,  a  factors  loadings 
matrix  was  calculated.  These  loadings  provided  insight  into  responses  that  could  be 
grouped  together  into  a  single  factor.  A  varimax  rotation  was  applied  to  the  response 
data  for  easier  interpretation  of  the  factors.  This  rotation  pushed  the  loadings  further 
apart  which  allowed  for  a  better  understanding  of  which  response  belonged  to  which 
factor/component.  The  factors  loadings  matrix  with  two  retained  factors  is  displayed  in 
Table  54  and  the  rotated  factors  loading  matrix  for  two  factors  is  given  in  Table  55. 


Table  54.  Factors  Loadings  Matrix  for  2  Factors 


Factor  1 

Factor  2 

Time 

0.4912 

0.2192 

TPF 

-0.0489 

0.9217 

FPF 

-0.82 

-0.3071 

TFP 

-0.7646 

0.4113 

Table  55.  Rotated  Factors  Loadings  Matrix  for  2  Factors 


Rot.  Factor  1 

Rot.  Factor  2 

Time 

0.5338 

0.0664 

TPF 

0.2221 

0.8959 

FPF 

-0.8739 

-0.0546 

TFP 

-0.6113 

0.6164 

When  examining  the  factors  loadings  matrix,  each  response  was  designated  to  the 


factor  in  which  it  scored  highest,  regardless  of  the  sign.  For  two  factors  in  AutoGAD, 


Time,  FPF,  and  TFP  were  combined  into  one  factor  and  TPF  remained  its  own  factor. 


For  rotated  factors,  Time  and  FPF  were  grouped  together  while  TPF  and  TFP  were 


combined  into  a  single  factor. 
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The  same  analysis  was  conducted  when  three  factors  were  retained.  The  factors 
loadings  matrix  for  three  factors  is  given  in  Table  56  and  the  rotated  matrix  is  presented 


in  Table  57. 


Table  56.  Factors  Loadings  Matrix  for  3  Factors 


Factor  1 

Factor  2 

Factor  3 

Time 

0.4912 

0.2192 

0.836 

TPF 

-0.0489 

0.9217 

-0.2507 

FPF 

-0.82 

-0.3071 

0.2402 

TFP 

-0.7646 

0.4113 

0.2955 

Table  57.  Rotated  Factors  Loadings  Matrix  for  3  Factors 


Rot.  Factor  1 

Rot.  Factor  2 

Rot.  Factor  3 

Time 

0.0839 

-0.013 

0.9905 

TPF 

-0.0202 

0.9561 

-0.0176 

FPF 

-0.8168 

-0.3382 

-0.2071 

TFP 

-0.8505 

0.3409 

0.0378 

Regardless  of  using  factors  or  rotated  factors,  Table  56  and  Table  57  both 


indicated  that  time  and  TPF  was  its  own  factor  and  that  FPF  and  TFP  was  grouped  into  a 


common  factor. 


Following  the  detennination  of  the  number  of  factors  to  retain,  factor  (and 
rotated)  scores  were  calculated.  These  scores  represented  a  meaningful  subspace  of  the 
original  response  data.  Once  these  scores  were  obtained,  the  proposed  linear  combination 
techniques  were  applied  to  determine  single  dimensional  response  data  for  the  AutoGAD 
problem.  For  reference,  the  eight  linear  combination  techniques  applied  are  summarized 
in  Table  58. 
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Table  58.  Summary  of  Linear  Combinations  on  Factor  Scores 


Sum  Factor  Scores 

ii 

Sum  Weighted  Factor  Scores 

ii 

iM3 

Sum  Normalized  Factor  Scores 

o'- 

II 

Sum  Weighted  Normalized  Factor  Scores 

i=l 

Sum  Rotated  Factor  Scores 

r  =  t,f'R> 

i= 1 

Sum  Weighted  Rotated  Factor  Scores 

r  =  '£um 

i= 1 

Sum  Normalized  Rotated  Factor  Scores 

II 

Sum  Weighted  Normalized 

Rotated  Factor  Scores 

i=l 

Once  the  single  dimensional  response  data  was  calculated,  modeling  approaches 
were  conducted  on  the  data.  A  quadratic  regression  model  and  an  ANN  was  applied  to 
the  response  data  by  utilizing  the  same  CCD  used  on  the  responses  separately,  as  seen  in 
Section  4.4  and  Appendix  B.  This  resulted  in  eight  different  quadratic  models  and  eight 
different  ANNs  for  each  set  of  factors  (either  two  or  three).  After  the  appropriate 
quadratic  models  and  well-trained  ANNs  were  calculated,  a  search  was  performed  which 
determined  the  optimal  LT  value  and  its  corresponding  settings. 

To  measure  how  well  factor  analysis  perfonned,  an  alternate  method  of 
combining  the  response  data  into  a  single  dimension  was  examined.  This  involved 
appropriately  summing  the  standardized  response  data  into  a  single  dimension  (Davis, 
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2009).  This  is  the  current  method  utilized  in  AutoGAD  to  combine  the  four  responses 
into  a  single  dimension.  The  final  output  was  calculated  as: 

Combined  =  TimeUt)  -  TPF(st)  +  FPF(st)  -  TFP(s,)  (4.8) 

Equation  (4.8)  was  applied  to  the  response  data  and  a  quadratic  regression  model 
was  used  to  fit  the  resultant  data.  The  LT  search  was  applied  to  the  model  and  optimal 
settings  were  calculated,  as  shown  in  Table  59. 


Table  59.  LT  Settings  for  Combined  Response  Data 


DA 

MST 

BWSNR 

PTSNR 

BWI 

TBS 

cs 

SIH 

SIL 

LSNR 

WS 

Combined 

-2 

6 

0.01 

6 

0.03 

0 

0 

50 

5 

14 

3 

The  optimal  LT  settings  when  quadratic  regression  was  applied  to  the  two  factor 
AutoGAD  problem  is  displayed  in  Table  60.  The  optimal  LT  settings  for  ANNs  on  the 
same  two  factor  problem  are  given  in  Table  61.  As  seen  in  the  two  tables,  settings  among 
the  different  linear  combinations  varied  little.  In  addition,  several  techniques  shared  the 
same  optimal  settings.  However,  modeling  through  quadratic  regression  differed  from 
modeling  using  ANNs  in  terms  of  optimal  LT  settings. 


Table  60.  LT  Settings  for  Two  Factors  using  Quadratic  Regression 


DA 

MST 

BWSNR 

PTSNR 

BWI 

TBS 

cs 

SIH 

SIL 

LSNR 

WS 

FA1 

-2 

14 

0.01 

6 

0.06 

0 

0 

50 

45 

14 

3 

FA2 

-2 

14 

0.01 

6 

0.07 

0 

0 

50 

35 

14 

3 

FA3 

-2 

14 

0.01 

5 

0.06 

0 

0 

50 

45 

14 

3 

FA4 

-2 

14 

0.01 

6 

0.07 

0 

0 

50 

35 

14 

3 

Rot-FAI 

-2 

14 

0.01 

4 

0.07 

0 

0 

50 

45 

14 

3 

Rot-FA2 

-2 

14 

0.01 

6 

0.07 

0 

0 

50 

45 

14 

3 

Rot-FA3 

-2 

14 

0.01 

5 

0.07 

0 

0 

50 

45 

14 

3 

Rot-FA4 

-2 

14 

0.01 

6 

0.07 

0 

0 

50 

45 

14 

3 
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Table  61.  LT  Settings  for  Two  Factors  using  ANNs 


DA 

MST 

BWSNR 

PTSNR 

BWI 

TBS 

cs 

SIH 

SIL 

LSNR 

WS 

FA1 

-2 

6 

0.1 

6 

0.1 

1 

1 

50 

5 

14 

3 

FA2 

-2 

6 

0.1 

6 

0.1 

0 

1 

50 

5 

14 

3 

FA3 

-2 

6 

0.1 

6 

0.01 

0 

1 

50 

45 

14 

3 

FA4 

-2 

6 

0.1 

1 

0.1 

1 

1 

50 

5 

14 

3 

Rot- FA  1 

-2 

6 

0.1 

6 

0.01 

1 

1 

150 

5 

14 

3 

Rot-FA2 

-2 

6 

0.1 

6 

0.1 

1 

1 

50 

5 

14 

3 

Rot-FA3 

-2 

6 

0.1 

6 

0.01 

1 

1 

150 

5 

14 

3 

Rot-FA4 

-2 

6 

0.1 

1 

0.1 

0 

1 

50 

5 

14 

3 

To  measure  the  performance  of  these  settings  versus  the  combined  settings 
calculated  in  Equation  (4.8),  the  settings  were  applied  to  the  untrained  images,  as  done  in 
Section  4.4.  Each  combination  of  settings  was  applied  to  each  of  the  eight  images.  The 
results  under  each  of  the  four  responses  were  averaged,  variance  calculated,  and  an  LT 
statistic  was  also  calculated. 

As  seen  in  Table  62,  all  factor  analysis  methods  outperformed  the  combined 
setting  results  in  terms  of  FPF  and  TFP.  Also,  most  of  the  methods  were  superior  in 
terms  of  time.  Only  two  of  the  rotated  factor  methods  were  better  than  the  combined 
settings  in  terms  of  TPF.  From  these  results,  a  strong  case  can  be  made  that  using  factor 
analysis  techniques  to  reduce  the  data  set  was  preferred  over  simply  summing  the 
standardized  response  data.  No  one  factor  (rotated)  method  clearly  outperformed 
another,  but  rather  all  outperformed  the  combined  settings  three  out  of  the  four  available 
responses  almost  every  time. 
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Table  62.  LT  Results  for  Two  Factors 


Method 

Time 

TPF 

FPF 

TFP 

Combined 

1456.387 

-0.83286 

0.002415 

-0.01681 

Quad.  Reg. 

FA1 

FA2 

FA3 

FA4 

Rot-FAI 

Rot-FA2 

Rot-FA3 

Rot-FA4 

1334.624 

1555.191 

1551.976 

1465.315 

1400.619 

1555.191 

1447.621 

1555.191 

-0.57549 

-0.55978 

-0.57082 

-0.56009 

-0.55851 

-0.55978 

-0.56272 

-0.55978 

1.58E-05 
9.41  E-06 
2.2E-05 

1.02E-05 
1.05E-05 
9.41  E-06 

1.17E-05 
9.41  E-06 

-0.48553 

-0.54581 

-0.45629 

-0.54023 

-0.52534 

-0.54581 

-0.51857 

-0.54581 

ANNs 

FA1 

FA2 

FA3 

FA4 

Rot-FAI 

Rot-FA2 

Rot-FA3 

Rot-FA4 

1474.466 

1522.444 

1721.981 

1441.736 

1779.229 

1474.466 

1779.229 

1350.419 

-0.69037 

-0.67242 

-0.81845 

-0.69715 

-0.84474 

-0.69037 

-0.84474 

-0.69106 

2.66E-05 

1.49E-05 

0.000914 

4.25E-05 

0.002124 

2.66E-05 

0.002124 

3.09E-05 

-0.52894 

-0.63267 

-0.29161 

-0.49343 

-0.09883 

-0.52894 

-0.09883 

-0.56769 

The  same  analysis  was  conducted  using  three  retained  factors  to  see  if  any 


performance  was  gained  by  adding  another  factor.  The  optimal  LT  settings  using 


quadratic  regression  when  three  factors  were  retained  are  given  in  Table  63.  Also,  the  LT 


settings  for  the  same  three  factors  using  ANNs  are  presented  in  Table  64.  Once  again, 


little  variation  among  the  settings  within  each  approach  occurred,  although  there  was 
more  variation  compared  to  settings  in  Table  60  and  Table  61. 
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Table  63.  LT  Settings  for  Three  Factors  using  Quadratic  Regression 


DA 

MST 

BWSNR 

PTSNR 

BWI 

TBS 

cs 

SIH 

SIL 

LSNR 

WS 

FA1 

-2 

8 

6 

0.1 

0 

0 

50 

5 

14 

3 

FA2 

-2 

9 

6 

0.1 

0 

0 

50 

5 

14 

3 

FA3 

-2 

13 

6 

0.1 

0 

0 

50 

45 

14 

3 

FA4 

-2 

14 

0.1 

1 

0.1 

0 

1 

150 

5 

4 

3 

Rot- FA  1 

-2 

14 

6 

0 

0 

50 

45 

14 

3 

Rot-FA2 

-2 

14 

6 

0 

0 

50 

35 

14 

3 

Rot-FA3 

-2 

14 

4 

0.07 

0 

0 

50 

45 

14 

3 

Rot-FA4 

-2 

10 

0.1 

6 

0.1 

0 

0 

150 

5 

14 

3 

Table  64.  LT  Settings  for  Three  Factors  using  ANNs 


DA 

MST 

BWSNR 

PTSNR 

BWI 

TBS 

cs 

SIH 

SIL 

LSNR 

WS 

FA1 

-2 

6 

0.1 

6 

0.01 

0 

1 

50 

5 

14 

3 

FA2 

-2 

6 

0.1 

6 

0.1 

0 

1 

50 

5 

14 

3 

FA3 

-2 

6 

0.1 

6 

0.1 

1 

1 

50 

5 

14 

3 

FA4 

-2 

6 

0.1 

6 

0.1 

1 

1 

50 

5 

14 

3 

Rot- FA  1 

-2 

6 

0.1 

6 

0.01 

0 

1 

150 

45 

4 

3 

Rot-FA2 

-2 

6 

0.1 

6 

0.1 

0 

1 

50 

5 

14 

3 

Rot-FA3 

-2 

6 

0.1 

6 

0.01 

0 

1 

50 

5 

14 

3 

Rot-FA4 

-2 

6 

0.1 

6 

0.1 

1 

1 

50 

5 

14 

3 

These  settings  were  applied  to  the  eight  untrained  images  and  their  appropriate 
statistics  were  collected.  As  seen  in  Table  65,  stronger  results  were  obtained  when  three 
factors  were  retained,  as  compared  to  two.  Once  again,  all  methods  outperformed  the 
combined  settings  in  terms  of  FPF  and  TFP.  In  fact,  the  margin  between  the  two  was 
calculated  as  much  larger  with  comparison  to  two  factors  retained.  TPF  for  the  combined 
settings  remained  superior,  but  the  gap  was  reduced  when  using  three  factors.  Finally, 
half  of  the  factor  analysis  methods  outperformed  the  combined  settings  in  terms  of  time, 
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while  the  other  half  remained  competitive.  Overall,  the  factor  (rotated)  methods  were 


preferred  over  the  combined  settings,  regardless  of  the  approach  utilized,  due  to  the 


substantial  reduction  of  LT  values  for  FPF  and  TFP,  while  remaining  better  or 


competitive  in  terms  of  time  and  TPF. 


Table  65.  LT  Results  for  Three  Factors 


Method 

Time 

TPF 

FPF 

TFP 

Combined 

1456.387 

-0.83286 

0.002415 

-0.01681 

Quad.  Reg. 

FA1 

FA2 

FA3 

FA4 

Rot-FAI 

Rot-FA2 

Rot-FA3 

Rot-FA4 

1470.449 

1444.256 

1561.006 

1540.971 

1555.191 

1465.315 

1400.619 

1528.943 

-0.61335 

-0.61528 

-0.61244 

-0.57881 

-0.55978 

-0.56009 

-0.55851 

-0.62177 

3.36E-05 

3.33E-05 

2.56E-05 
2.1  IE-06 
9.41  E-06 
1.02E-05 
1.05E-05 
2.84E-05 

-0.46849 

-0.45931 

-0.52579 

-0.73797 

-0.54581 

-0.54023 

-0.52534 

-0.53144 

ANNs 

FA1 

FA2 

FA3 

FA4 

Rot-FAI 

Rot-FA2 

Rot-FA3 

Rot-FA4 

1532.336 

1522.444 

1474.466 

1474.466 

1602.298 

1522.444 

1721.981 

1474.466 

-0.84592 

-0.67242 

-0.69037 

-0.69037 

-0.82344 

-0.67242 

-0.81845 

-0.69037 

0.001098 

1.49E-05 

2.66E-05 

2.66E-05 

0.00084 

1.49E-05 

0.000914 

2.66E-05 

-0.15453 

-0.63267 

-0.52894 

-0.52894 

-0.30052 

-0.63267 

-0.29161 

-0.52894 

The  results  reported  in  this  section  using  factor  analysis  were  strong  in  terms  of 


their  use  over  simply  summing  standardized  response  data.  Although  superior 


performance  was  not  achieved  in  terms  of  all  four  outputs  simultaneously,  factor  analysis 


typically  had  improved  results  in  three  of  the  four  outputs.  Also,  the  difference  in  LT 


scores  for  FPF  and  TFP  was  quite  large  when  using  factor  analysis  over  simple 


summation.  Also,  ANNs  provided  a  better  fit  to  the  data  over  quadratic  regression,  but 


this  situation  could  change  depending  on  the  nature  of  the  problem  being  examined. 
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V.  Contributions  and  Future  Research 


5.1  Overview 

A  summary  of  the  contributions  made  to  the  field  of  applied  statistics  and  design 
of  experiments  through  the  research  conducted  and  presented  in  this  document  is 
provided  in  this  chapter.  A  list  of  potential  areas  for  further  investigation  related  to  this 
research  is  also  provided. 

5.2  Research  Contributions 

Several  contributions  in  the  fields  of  applied  statistics  and  design  of  experiments 
were  made  in  this  research.  Each  contribution  is  summarized. 

5.2.1  Doubly  Robust  Settings 

A  gradient  analysis  was  applied  to  the  coefficients  of  derived  process  models. 

This  gradient  search  detennined  the  worst  possible  system  degradation  that  could  occur, 
through  perturbations  in  the  coefficients.  Solving  for  robust  control  settings  along  this 
gradient  search  allowed  for  future  protection  against  unfavorable  results  due  to 
degradation.  These  doubly  robust  settings  maintained  their  consistency  in  being  robust  to 
noise  in  the  system,  as  modeled  in  traditional  RPD.  A  gradient  search  was  developed 
using  quadratic  regression,  RBFNNs  or  GRNNs.  This  gradient  analysis  is  applicable  to 
any  process  model  involving  control  and  noise  variables. 
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5.2.2  Artificial  Neural  Networks  in  RPD 


Methods  for  utilizing  ANNs  in  RPD  were  discussed  in  this  dissertation.  Two 
methods,  depending  on  available  data,  were  derived  to  model  the  mean  and  variance 
models  necessary  for  RPD.  Experiments  confirmed  the  usefulness  of  ANNs  when 
quadratic  regression  failed  to  lit  highly  non-linear  models. 

A  gradient  search  algorithm  was  developed  based  upon  the  weights  of  the  ANNs 
to  determine  doubly  robust  settings  if  quadratic  regression  is  inappropriate.  The  doubly 
robust  settings  in  ANNs  proved  as  effective,  if  not  more  so,  than  those  obtained  using 
quadratic  regression. 

5.2.3  Factor  Analysis  in  RPD 

An  alternative  set  of  methods  was  derived  to  reduce  multiple  response  problems 
to  a  single  dimension.  Ideally,  factor  analysis  would  retain  only  one  factor;  however,  if 
multiple  factors  still  remained,  linear  combinations  were  applied  to  reduce  the  application 
to  a  single  response.  Reduction  to  a  single  response  allowed  for  RPD  to  be  performed  in 
the  traditional  sense.  Factor  analysis  was  shown  to  be  more  effective  than  simply  a 
summation  of  standardized  response  data. 

5.3  Recommendations  for  Future  Research 

Several  areas  of  continued  research  are  suggested. 

5.3.1  Robust  Parameter  Design 

Throughout  this  research,  Lin  &  Tu  (1998)  methodology  was  applied  to  RPD  and 
doubly  robust  RPD.  Table  1  presented  alternative  methods  of  solving  the  dual  response 
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problem  in  RPD  and  LT  was  selected  as  the  author’s  choice.  Other  methods  could  have 
provided  more  appropriate  results  depending  on  the  problem.  The  difference  between 
utilizing  the  different  methods  is  left  to  be  explored. 

5.2.2  Artificial  Neural  Networks 

RBFNNs  and  GRNNs  were  selected  as  the  neural  networks  applied  in  this 
research.  However,  the  Feed  Forward  neural  network  (FFNN)  is  another  widely  used 
ANN  in  research.  Typically  used  for  classification  purposes,  this  ANN  can  be  tested  for 
its  validity  in  RPD.  Also,  a  gradient  analysis  could  be  conducted  on  the  weights  of  the 
FFNN  to  detennine  doubly  robust  settings. 

5.2.3  Multiple  Responses  in  RPD 

Eight  linear  combinations  of  factor  analysis  were  developed  to  reduce  multiple 
responses  into  a  single  dimension.  Further  exploration  of  this  concept  could  discover  a 
new  combination  technique  which  could  achieve  superior  and  more  consistent  results. 
Multiple  responses  in  RPD  is  a  lightly  researched  area  and  has  the  potential  for 
tremendous  contributions. 
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Appendix  A.  Summary  of  Example  Problems 


Problem 

Source 

Variables 

Responses 

Application 

Semiconductor 

Manufacturing 

Myers  & 
Montgomery 
(2002:566) 

2  control 

3  noise 

Single 

(minimized) 

Quadratic 
Regression  in 
RPD 

Doubly  Robust 

(QR) 

Semiconductor 

Manufacturing 

Extended 

(higher-order  terms) 

Myers  & 
Montgomery 
(2002:566) 

2  control 

3  noise 

Single 

(minimized) 

ANNs  in  RPD 

Doubly  Robust 
(ANNs) 

Force  Transducer 

Koksoy  (2008) 

3  control 

2  noise 

Non-linearity 

(u=l) 

Flysteresis 

(min) 

QR  vs  ANNs 
in  RPD 

Notional  32  x  22 
Design  with  5 
Responses 

None 

2  control 

2  noise 

Y  Y  Y 

(min) 

Y4,Y5  (max) 

Factor  Analysis 

Problem 
(examined  but 
not  presented) 

Source 

Variables 

Responses 

Application 

Color  TV  Images 

Myers  & 
Montgomery 
(2002:570) 

2  control 

2  noise 

Single 

(maximized) 

Quadratic 
Regression  and 
ANNs  in  RPD 
Doubly  Robust 

(QR) 

Hard  Disk  Drive 
Quality 

Su  &  Tong  (1997) 

5  control 

1  noise 

PW  (min) 
HFA  (max) 
OW  (max) 

PS  (min) 

QR  vs  ANNs 
in  RPD 

Notional  3"  x  22 
Design  with  4 
Responses 
(representative  of 
Davis  (2009)  work) 

None 

2  control 

2  noise 

Yx,Y2  (max) 
Y3,Y4  (min) 

Factor  Analysis 
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Appendix  B.  CCD  for  AutoGAD 


Dim 

Max 

Bin 

Pt  SNR 

Bin  ID 

Thresh 

Clean 

High 

Low 

LSNR 

Win 

Fishers 

%Tgt 

#  Clusters 

-2 

6 

0.1 

6 

0.1 

1 

0 

150 

5 

14 

3 

-1.00 

-0.80 

-0.26 

-2 

6 

0.01 

1 

0.01 

0 

1 

150 

45 

14 

11 

0.71 

-0.63 

-1.00 

-2 

14 

0.1 

6 

0.1 

1 

0 

50 

5 

4 

3 

1.00 

-0.19 

-0.99 

-2 

14 

0.1 

6 

0.1 

1 

0 

150 

5 

4 

11 

0.71 

-1.00 

0.26 

-2 

14 

0.1 

1 

0.01 

0 

0 

150 

45 

14 

3 

1.00 

-1.00 

1.00 

-2 

14 

0.1 

6 

0.1 

0 

1 

150 

45 

4 

3 

0.71 

-0.63 

-1.00 

2 

10 

0.06 

3.5 

0.06 

1 

0 

100 

25 

9 

7 

-1.00 

1.00 

-0.79 

2 

6 

0.1 

1 

0.01 

1 

0 

150 

45 

14 

11 

0.43 

-0.84 

-0.56 

2 

14 

0.01 

1 

0.1 

0 

1 

50 

45 

4 

3 

1.00 

-1.00 

1.00 

2 

14 

0.1 

6 

0.01 

0 

1 

50 

5 

14 

3 

0.71 

-1.00 

0.26 

2 

14 

0.01 

6 

0.1 

0 

0 

50 

5 

14 

11 

0.71 

-0.63 

-1.00 

-2 

6 

0.01 

6 

0.01 

0 

0 

150 

5 

4 

3 

-1.00 

-0.80 

-0.26 

-2 

6 

0.01 

6 

0.1 

1 

0 

50 

45 

4 

3 

0.43 

-0.84 

-0.56 

2 

14 

0.01 

6 

0.01 

0 

0 

50 

45 

14 

3 

0.71 

-1.00 

0.26 

2 

14 

0.01 

6 

0.1 

1 

1 

50 

45 

4 

11 

-1.00 

1.00 

-0.79 

2 

14 

0.1 

6 

0.01 

0 

1 

50 

5 

4 

11 

0.71 

-0.63 

-1.00 

-2 

6 

0.1 

1 

0.1 

0 

0 

50 

45 

14 

3 

0.71 

-1.00 

0.24 

2 

6 

0.1 

1 

0.01 

1 

0 

150 

45 

14 

11 

1.00 

-0.19 

-0.99 

-2 

6 

0.1 

6 

0.01 

1 

0 

50 

5 

14 

3 

-1.00 

-0.80 

-0.26 

-2 

14 

0.01 

1 

0.01 

1 

0 

50 

45 

4 

3 

0.71 

-1.00 

0.26 

2 

6 

0.01 

6 

0.1 

1 

0 

50 

5 

4 

3 

0.71 

-1.00 

0.24 

2 

14 

0.1 

1 

0.1 

0 

1 

50 

5 

4 

11 

-1.00 

-0.80 

-0.26 

-2 

14 

0.1 

6 

0.01 

1 

1 

50 

45 

14 

11 

0.43 

-0.84 

-0.56 

2 

14 

0.01 

6 

0.1 

1 

0 

150 

45 

4 

11 

0.71 

-0.63 

-1.00 

2 

14 

0.01 

6 

0.01 

0 

0 

50 

5 

14 

11 

0.71 

-1.00 

0.24 

2 

6 

0.01 

6 

0.01 

1 

0 

50 

45 

14 

3 

1.00 

-0.19 

-0.99 

-2 

14 

0.01 

6 

0.01 

1 

0 

150 

45 

4 

3 

0.43 

-0.84 

-0.56 

2 

6 

0.1 

6 

0.1 

0 

0 

150 

5 

4 

11 

-1.00 

1.00 

-0.79 

-2 

6 

0.1 

1 

0.01 

0 

1 

150 

5 

4 

11 

1.00 

-1.00 

1.00 

-2 

6 

0.1 

1 

0.1 

0 

0 

150 

45 

4 

11 

0.71 

-1.00 

0.26 

2 

6 

0.1 

6 

0.1 

1 

1 

150 

5 

4 

11 

-1.00 

-0.80 

-0.26 

2 

6 

0.1 

6 

0.01 

1 

1 

150 

45 

4 

11 

0.43 

-0.84 

-0.56 

-2 

14 

0.01 

6 

0.01 

0 

1 

150 

45 

4 

3 

1.00 

-1.00 

1.00 

-2 

6 

0.1 

6 

0.01 

0 

0 

150 

45 

14 

3 

0.71 

-0.63 

-1.00 

2 

6 

0.01 

6 

0.1 

1 

0 

50 

5 

4 

3 

-1.00 

-0.80 

-0.26 

2 

14 

0.01 

6 

0.01 

0 

0 

150 

45 

4 

3 

0.71 

-1.00 

0.26 

2 

14 

0.1 

6 

0.01 

1 

1 

150 

5 

4 

3 

1.00 

-1.00 

1.00 

2 

14 

0.1 

1 

0.01 

1 

0 

50 

5 

14 

3 

-1.00 

-0.80 

-0.26 

-2 

14 

0.01 

6 

0.01 

0 

1 

150 

5 

4 

3 

1.00 

-1.00 

1.00 

-2 

6 

0.1 

1 

0.01 

1 

1 

150 

45 

14 

3 

0.71 

-0.63 

-1.00 

-2 

14 

0.1 

6 

0.1 

0 

1 

50 

5 

4 

3 

-1.00 

1.00 

-0.79 

-2 

6 

0.1 

6 

0.1 

0 

1 

150 

5 

14 

3 

0.71 

-1.00 

0.24 

0 

10 

0.06 

3.5 

0.06 

0 

1 

100 

5 

9 

7 

0.71 

-1.00 

0.24 

2 

14 

0.01 

6 

0.1 

0 

0 

50 

5 

14 

3 

0.43 

-0.84 

-0.56 

-2 

6 

0.1 

1 

0.1 

0 

1 

150 

5 

4 

11 

-1.00 

-0.80 

-0.26 

-2 

6 

0.01 

6 

0.01 

1 

0 

50 

5 

4 

3 

0.71 

-0.63 

-1.00 

-2 

6 

0.1 

1 

0.1 

1 

1 

150 

45 

4 

3 

-1.00 

1.00 

-0.79 

2 

6 

0.01 

1 

0.1 

0 

1 

50 

45 

4 

11 

0.71 

-1.00 

0.24 

0 

10 

0.06 

3.5 

0.1 

1 

0 

100 

25 

9 

7 

0.71 

-1.00 

0.26 
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